Deere ee iad 
ee eer ae = 
a μάν μάντρα ἢ 


2 ys 
ong or εἱ κεν 
rear ees oe aa ὅσ σι. 
qe ee ee 
wate es 
= 


sot 


τι 

ele eon τε τπτν γ τα 
' ee ee --α 

erat! 


a8 


— ἴτας: ie rad τα ας 


= 
gare bene in are! a 
4a 3 eae alata tis’ iy 


ae oe 
Pg er ehh 
ae. . 


' 
3 ver kerierad 
ers SS ἡ 
. arpa acans 


na ὁ ἢ 


των a 

ina . 
ao τ΄ υἰήβις πος Ἐιατν το ἀτοῆ, πταὶ αττὶ α 
ae, aie Ξε ἀπ ee κι 


ae ἘΣ Βα ἄρ 4 Δ Ρ 
τς: i 


re 


. Sao ees 


oe 


Soe a4 = is i a 
eee teen e = 
στο ate τ πὶ 


et Serer 


μασι αι mare πεν σας 


τα τς ee oe 


ΠΣ a 7 κ-- 

πα ΜΕ ὧν μασι 

co mie dante ami ar re ταν σας reer 
Mae Str: 


ea 
pera erar os 
εὐ ὦν ee! Ἐπ 


Π ey ee ee eee 
hal re rr eS a 
Be τ 

ee Ἐπὶ στ με 9-- 
a i aaa 
ae 


' a 
=o. ρα ρας hasan μεν. ἀπ τᾷ 
i a 


ἀ 9 pe ee 
‘ae ote ib τας 
τ : 


<0 ele μεν eed = oe eae Bed 
pet a yg ee ee ,-ὦ 
ee «Φο ω ὦ 
ὩΠ ε΄ ἡ πα “τ 


+ 

: 4 ‘ + 
a gage nee ee 
er ee ee ee ee ee 
4 


= 
ΣΡ ee ee ee a 
Ferree) ad tt 
μ im μα 


a eee 


eae: 


eee agers 


Ht 


ἦν Pa 
Pgs ee 
ἘΠῚ 94 ee Φ τᾶ - 


4 
i ΝΟ ἀπΞ» 5. 
bob ord eran - 


ΘΝ 


esa sa 
adele Sa feb betwee 
— μένα. ge ee By 
337 tape 
® = 


ede 
τοῦτ 


ΝΣ αν ge Γ 
[τι στκ ταν γε 


bobo 
= 
oe oe ee αὶ ἃ 
meat en at ae 
- a SPL Te σὰ πα ES 


eee δ 


co 


- 
a tate 

fee τιν και, ρα κοι ne = 
Lge ee ees 1 

a. arse bra τε ται πε ta ΤΆ ΓΤ ΝΣ 
eae Site iiactreeecaey 
aay vl 

ἘΞ τε Ξε τες: a 


pees Bm 
ah 


4 
τ᾿ τατο-ς 
Cpt 


ee ge 4 
eat crete + 


ee 


ri bette tet 4 
i= α-- 


tere + 

ar ἡ ΚιΈΣ. τὶ ne re στὰ ἃ Berne 
z - “- 

emer τε σεται see et 


— 
oes 4 
eg ke κ᾿ -- 

τ eats 


rae 


2 * 
SL eee τες ἘΝ τῷ 
be μ͵ὶ,᾿ 6.4. Ὁ 
arr δέ are μὲν αἶα ae rey pra sa 
heey 2 : 


a 


a γε 
tape ae Sera 
ΓῚ εἶεν 


gee ες 
eee io 
a =r + 


ρέμα pe oh 
δε +4 aa oterss 


ea 


po de 
a ee ee ee 
<4 — jee as απ παπίπα ,α 


fla le 
ier: σας = 
7 — 
eS 
i μι μ᾽ ἦρι μὰ τα 
ee Ser 


cas 
ee pe tee αὐ τλεὴς ae 4 


+ ἃ ἜΣ σε ὐπσπτ θυχο 
πα τ τὶ 

ται Ὑπεπεξε αι πὶτω τὶ aes a 
i 

πεσε. 

ππ τ 


pet 


eae 
bye Bale ira ἀξ τσ ας, ἀν et 
κα πα ts 
5 


a 
τ ee rae Es 
— ++ Se ee vase 


7 
+ 
Par eres ees 
τι πε nr ares Ser a ars ee ae tes 
- fo αι 
Pep aee are. 
2 
tt ρργφττ πὰ 


τον a 
μιν eg ἐπε ταρτι 
pereen 


ae dee ered See αὶ 

" ot oe τε τ τὴς πο ner aren τ. ἡ — wm bere 

i ++ + 9 ΕΝ 

᾿ τα πα ie πες το ττρτ απ τ μα - - 
=e Sapa ne ce ee ee 


ab 
= 


. i 
pe ὑπ μα ἡ 
Poe 
as 
Sa oor 6 αν τα τ 
coro 


ag 
poet tet yt - τ 
κ-ὁ4α-.-« - πε τετπτε ἀφ τι τὸν ee Be νος 
i Parana ae 
= 


i nw ay 
bet be eri 


pewecs τες τ τα ττατ 
-- ρα σε oe 
ie ὦ tis Sak eis 
a f= eaeseempenst =e 


yi κα. — 


= ed Ὁ 
Ξε λεξ paar 
ας με μα poke 
ἐπα Στεταν 


te 


om a τε 
1 ee ee μα 
a μι τμε δι 


rr aus κ᾿ 
=i a 
eras os4a στα τε 
pA ear ae 


a t- 
ayer a Se ee ry δὲ ἧς 
ἦν: 
ois με: 


ia = Ἴδα a ie 
mame aaah eee a 


ri | 4 pie rears 
er 

aE ας τὰν ΓΤ 

See ee pe τι 


whee anaes = 


͵ 
SS τ τ τ ee eee 


a 
eee SS ES eee ee rr a oe ee ee 


This book is due for return on or before the 
last date shown above. 


S451, 


OTHER BOOKS BY PETER HENRICI 


Discrete Variable Methods in Ordinary 
Differential Equations, Wiley, 1962 


Error Propagation for Difference Methods, Wiley, 1963 


ELEMENTS 
OF NUMERICAL 
ANALYSIS 


PETER HENRICI 
Professor of Mathematics 


Eidgendssische Technische Hochschule 
Ziirich, Switzerland 


S\45\ 


= 


John Wiley & Sons, Inc. New York - London : Sydney 


ow ΜΆ, 3 Ν 
ov. Tae ON 
Pe R λ | ) / Ὶ τ 
| | B Vaan! 1 ῥ. 
Lar fh 


f 


Copyright © 1964 by John Wiley & Sons, Inc. 

All Rights Reserved. This book or any part thereof must not be reproduced 
in any form without the written permission of the publisher. 

Library of Congress Catalog Card Number: 64-23840 

PRINTED IN THE UNITED STATES OF AMERICA 

10 11 #12 13 «14 

ISBN 0 471 37241 2 


to George Elmer Forsythe 


PREFACE 


The present book originated in lecture notes for a course entitled “Nu- 
merical Mathematical Analysis” which I have taught repeatedly at the 
University of California, Los Angeles, both in regular session and in a 
Summer Institute for Numerical Analysis, sponsored regularly by the 
National Science Foundation, whose participants are selected college 
teachers expecting to teach similar courses at their own institutions. 
The prerequisites for the course are 12 units of analytic geometry and 
calculus plus 3 units of differential equations. 

The teaching of numerical analysis in a mathematics department poses 
a peculiar problem. At a time when the prime objectives in the instruc- 
tion of most mathematical disciplines are rigor and logical coherence, 
many otherwise excellent textbooks in numerical analysis still convey 
the impression that computation is an art rather than a science, and that 
every numerical problem requires its own trick for its successful solution. 
It is thus understandable that many analysts are reluctant to take much 
interest in the teaching of numerical mathematics. As a consequence 
these courses are frequently taught by instructors who are not primarily 
concerned with the mathematical aspects of the subject. Thus little is 
done to whet the computational appetite of those students who feel 
attracted to mathematics for the sake of its rich logical structure and 
clarity, and our schools fail to turn out computer-trained mathematicians 
in the large numbers demanded by modern science and technology. 

Contrary to the view of computation as an art, I have always taken 
the attitude that numerical analysis is primarily a mathematical discipline. 
Thus I have tried to stress unifying principles rather than tricks, and to 


Vii 


rg 


Viil Preface 


establish connections with other branches of mathematical analysis. In- 
deed, if looked at in this manner, numerical analysis provides the in- 
structor with a wonderful opportunity to strengthen the student’s grasp 
of some of the basic notions of analysis, such as the idea of a sequence, 
of a limit, of a recurrence relation, or of the concept of a definite integral. 
To achieve a balance between practical and theoretical content, I have 
made — for the first time in a numerical analysis text — a clearcut dis- 
tinction between algorithms and theorems. An algorithm is a computa- 
tional procedure; a theorem is a statement about what an algorithm does. 

In addition to standard material the book features a number of modern 
algorithms (and corresponding theorems) that are not yet found in most 
textbooks of numerical analysis, such as: the quotient-difference al- 
gorithm, Muller’s method, Romberg integration, extrapolation to the 
limit, sign wave analysis, computation of logarithms by differentiation, 
and Steffensen iteration for systems of equations. In the last chapter 1 
have given an elementary theory of error propagation that is sufficiently 
general to cover many algorithms of practical interest. 

Another novelty for a textbook in numerical analysis is my attempt 
to treat the theory of difference equations with the same amount of rigor 
and generality that is usually given to the theory of differential equations. 
Thus this important research tool becomes available for systematic use 
in a number of contexts. In fact, difference equations form one of the 
unifying themes of the book. 

On the basis of my teaching experience I have found it necessary to 
include a rather extensive chapter on complex numbers and polynomials. 
The modern trend of replacing the course in classical theory of equations 
by a course in linear algebra has had the rather curious consequence 
that, at the level for which this book is intended, students now are less 
familiar with the basic properties of the complex field than they used to be. 

The book contains about 300 problems of varying computational and 
analytical difficulty. (Some of the more demanding problems are marked 
by an asterisk.) In addition, a small number of research problems have 
been stated at the end of some of the chapters. Some of these problems 
are in the form of non-trivial theoretical assignments requiring library 
work. The others pose practical questions of some general interest and 
are intended to stimulate undergraduate research participation. Their 
solution usually presents a non-trivial challenge in experimental compu- 
tation. 

I have omitted all references to numerical methods in algebra and 
matrix theory, because I feel that this topic is best dealt with in a 
separate course. As its theoretical foundations are quite different, to 


Preface 1X 


treat it with the same attitude that I have tried to adopt towards nu- 
merical analysis would roughly have doubled the size of the book. 

For a similar reason I have omitted material on programming and 
programming languages. It goes without saying, however, that no cur- 
riculum in numerical mathematics can be complete without permitting 
the student to acquire some experience in actual computation. At the 
Institute for Numerical Analysis, this experience is acquired in a simul- 
taneous three unit programming course (one unit lectures, two units 
laboratory). The subject matter in this book is deliberately arranged 
so that the easy-to-program algorithms occur early, and a course based 
on it can easily be synchronized with a programming course. 

A preliminary version of the book has been used on a trial basis at a 
number of schools, and I have been fortunate to receive a number of 
comments and constructive criticisms. In particular I wish to express 
my gratitude to Christian Andersen, P. J. Eberlein, Gene H. Golub, M. 
Melkanoff, Duane Pyle, T. N. Robertson, Sydney Spital, J. F. Traub, 
and Carroll Webber for suggestions which I have been able to incorporate 
in the final text. I also wish to thank Thomas Bray and Gordon Thomas 
of the Boeing Scientific Research Laboratories for assistance in planning 
some of the machine-computed examples. Finally I record my debt to 
my wife, Eleonore Henrici, for her unflinching help, far beyond the call 
of conjugal duty, in preparing both the preliminary and the final version 
of the manuscript. 

I dedicate this book to my mentor and former collegue, George E. 
Forsythe, who had a decisive influence on my view of the whole area of 
mathematical computation. 


Zurich, Switzerland P. HENRICI 
June 1964 


CONTENTS 


INTRODUCTION 


chapter 1 What isNumericalAnalysis? 3 
1.1 Attempt of a Definition, 3 
1.2 A Glance at Mathematical History, 4 
1.3 Polynomial Equations: An Illustration, 6 
1.4 Howto Describe an Algorithm, 8 
1.5 Convergence and Stability, 9 


chapter 2 Complex Numbers and Polynomials, 13 
2.1 Algebraic Definition, 13 
2.2 Geometric Interpretation, 18 
2.3 Powers and Roots, 26 
2.4 The Complex Exponential Function, 30 
2.5 Polynomials, 33 
2.6 Multiplicity and Derivative, 39 


chapter3 Difference Equations, 44 
3.1 Differential Equations, 44 
3.2 Difference Equations, 46 
3.3. Linear Difference Equations of Order One, 48 
3.4 Horner’s Scheme, 51 
3.5 Binomial Coefficients, 52 
3.6 Evaluation of the Derivatives of a Polynomial, 54 


Xi 


ΧΙ Contents 


PART ONE SOLUTION OF EQUATIONS 


chapter 4 


chapter 5 


chapter 6 


* Sections that may be omitted at first reading without essential loss of continuity. 


Iteration, 61 
4.1 Definition and Hypotheses, 61 
4.2 Convergence of the Iteration Method, 65 
4.3 The Error after a Finite Number of Steps, 68 
4.4 Accelerating Convergence, 70 
4.5 Aitken’s A?-Method, 72 
4.6 Quadratic Convergence, 75 
4.7 Newton’s Method, 77 
#48 A Non-local Convergence Theorem for Newton's 
Method, 79 
4.9 Some Special Cases of Newton’s Method, 81 
4.10 Newton’s Method Applied to Polynomials, 84 
4.11 Some Modifications of Newton’s Method, 87 
4.12 The Diagonal Aitken Procedure, 90 
*4.13 A Non-local Convergence Theorem for Steffensen’s 
Method, 93 


Iteration for Systems of Equations, 97 
5.1 Notation, 97 
5.2 A Theorem on Contracting Maps, 99 
*5.3 A Bound for the Lipschitz Constant, 101 
5.4 Quadratic Convergence, 103 
5.5 Newton’s Method for Systems of Equations, 105 
5.6 The Determination of Quadratic Factors, 108 
5.70 Bairstow’s Method, 110 
5.8 Convergence of Bairstow’s Method, 113 
*5.9 Steffensen’s Iteration for Systems, 115 


Linear Difference Equations, 119 
6.1 Notation, 119 
6.2 Particular Solutions of the Homogeneous Equation of 
Order Two, 120 
6.3 The General Solution, 122 
6.4 Linear Dependence, 128 
6.5 The Non-homogeneous Equation of Order Two, 130 
6.6 Variation of Constants, 133 
6.7 The Linear Difference Equation of Order N, 135 
6.8 A System of Linearly Independent Solutions, 137 
6.9 The Backward Difference Operator, 141 


Contents xiii 


chapter 7 Bernoulli’s Method, 146 
7.1 Single Dominant Zero, 146 
7.2 Accelerating Convergence, 149 
7.3 Zeros of Higher Multiplicity, 151 
7.4 Choice of Starting Values, 153 
7.5 ‘Two Conjugate Complex Zeros, 155 
*7.6 Sign Waves, 158 


chapter8 The Quotient-Difference Algorithm, 162 

8.1 The Quotient-Difference Scheme, 162 
8.2 Existence of the OD Scheme, 165 
8.3 Convergence Theorems, 166 
8.4 Numerical Instability, 168 
8.5 Progressive Form of the Algorithm, 176 
8.6 Computational Checks, 174 
8.7 QD Versus Newton, 175 

*8.8 Other Applications, 176 


PART TWO INTERPOLATION AND APPROXIMATION 


chapter9 The Interpolating Polynomial, 183 
9.1 Existence of the Interpolating Polynomial, 183 
9.2 The Error of the Interpolating Polynomial, 186 
9.3 Convergence of Sequences of Interpolating 
Polynomials, 191 
*9.4 How to Approximate a Polynomial of Degree n by 
One of Degree n-1, 193 


chapter 10 Construction of the Interpolating Polynomial: Methods Using 
Ordinates, 194 


*10.1 Muller’s Method, 194 
*10.2 The Lagrangian Representation for Equidistant 
Abscissas, 201 
10.3. Aitken’s Lemma, 204 
10.4 Aitken’s Algorithm, 206 
10.5 Neville’s Algorithm, 207 
10.6 Inverse Interpolation, 209 
10.7 Iterated Inverse Interpolation, 211 


chapter 11 Construction of the Interpolating Polynomial: Methods Using 
Differences, 214 


11.1 Differences and Binomial Coefficients, 214 
11.2 Finalized Representations of Sequences of 
Interpolating Polynomials, 218 


|| 


| 
chapter 14 


chapter 16 


| 
XIV Contents 
| 


11.3 
*11.4 


12.1 
12.2 


12.3 
12.4 
e925 


13,3 
13.2 


1.2 
*13.4 
13.5 
13.6 
13.7 


Some Special Interpolation Formulas, 224 
Throwback, 227 


chapter 12 Numerical Differentiation, 231 


The Error of Numerical Differentiation, 231 
Numerical Differentiation Formulas for Equidistant 
Abscissas, 233 

Extrapolation to the Limit, 235 

Extrapolation to the Limit: The General Case, 239 
Calculating Logarithms by Differentiation, 242 


| chapter 13 Numerical Integration, 246 


The Error in Numerical Integration, 246 

Numerical Integration using Backward Differences, 
248 

Numerical Integration using Central Differences, 249 
Generating Functions for Integration Coefficients, 251 
Numerical Integration over Extended Intervals, 254 
Trapezoidal Rule with End Correction, 255 

Romberg Integration, 259 


Numerical Solution of Differential Equations, 263 


14.1 
14.2 
14.3 
14.4 
14.5 
14.6 


14.7 


14.8 


ea 
15.2 
15.3 
15.4 


Theoretical Preliminaries, 263 

Numerical Integration by Taylor’s Expansion, 265 
The Taylor Algorithm, 267 

Extrapolation to the Limit, 271 

Methods of Runge-Kutta Type, 274 

Methods based on Numerical Integration: the Adams- 
Bashforth Method, 276 

Methods based on Numerical Integration: the Adams- 
Moulton Method, 280 

Numericai Stability, 283 


PART THREE COMPUTATION 


chapter 15 Number Systems, 291 


Representation of Integers, 291 
Binary Fractions, 293 

Fixed Point Arithmetic, 297 
Floating Point Arithmetic, 299 


Propagation of Round-off Error, 302 


16.1 
16.2 


Introduction and Definitions, 302 
Finite Differences, 302 


Contents 


16.3 Statistical Approach, 305 
16.4 A Scheme for the Study of Propagated Error, 309 
16.5 Applications, 312 


Bibliography, 322 
Index, 326 
Answers for Problems, 329 


XV 


Z 
oO 
sa 
‘am 
ῷῳ 
=) 
= 
aD 
ee 
β- 
γά 
je 


chapter l what is numerical analysis? 


1.1 Attempt of a Definition | 


Unlike other terms denoting mathematical disciplines, such as calculus 
| or linear algebra, the exact extent of the discipline called numerical 
analysis is not yet clearly delineated. Indeed, as recently as twenty years 
ago this term was still practically unknown. It did not become generally 
] | : used until the Institute of Numerical Analysis was founded at the Univer- 
] sity of California in Los Angeles in 1947, But even today there are widely 
| diverging views of the subject. On the one hand, numerical analysis is 


|| associated with all those activities which are more commonly known as 

| data processing. These activities comprise—to quote only some of the 

more spectacular examples—such things as the automatic reservation of 

airline seats, the automatic printing of paychecks and telephone bills. the 

| instantaneous computation of stock market averages, and the evaluation 

| and interpretation of certain medical records such as electroencephalo- 

grams. On the other hand, the words “numerical analysis’’ have 

| | connotations of endless arithmetical drudgery performed by mathematical 

1 clerks armed with a pencil, a tremendous sheet of paper, and the 
| indispensable eraser. 

| Between these extreme views we wish to steer a middle course. As far 

as this volume is concerned, we shall mean by numerical analysis the 

theory of constructive methods in mathematical analysis. The emphasis 1s 

on the word “constructive.” By a constructive method we mean a 

| procedure that permits us to obtain the solution of a mathematical 

"ἢ problem with an arbitrary precision in a finite number of steps that can be 

| | | performed rationally. (The number of steps may depend on the desired 

] accuracy.) Thus, the mere proof that nonexistence of the solution would 

lead to a logical contradiction does not represent a constructive method. 
A constructive method usually consists of a set of directions for the 
Hit performance of certain arithmetical or logical operations in predetermined 


Ϊ : 


_ ὐδν 


4 elements of numerical analysis 


order, much as a recipe directs the housewife to perform certain chemical 
operations. As with a good cookbook, this set of directions must be 
complete and unambiguous. A set of directions to perform mathe- 
matical operations designed to lead to the solution of a given problem is 
called an algorithm. The word algorithm was originally used primarily to 
denote procedures that terminate after a finite number of steps. Finite 
algorithms are suitable mainly for the solution of problems in algebra. 
The student is likely to be familiar with the following two examples: 


1. The Euclidean algorithm for finding the greatest common divisor of 
two positive integers. 

2. The Gaussian algorithm for solving a system of ἢ linear equations with 
ἢ unknowns. 


The problems occurring in analysis, however, usually cannot be solved 
in a finite number of steps. Unlike the recipes of a cookbook, the 
algorithms designed for the solution of problems in mathematical analysis 
thus necessarily consist of an infinite (although denumerable) sequence of 
operations. Only a finite number of these can be performed, of course, in 
any practical application. The idea is, however, that the accuracy of the 
answer increases with the number of steps performed. If a sufficient 
number of steps are performed, the accuracy becomes arbitrarily high. 


Problems 


1. Formulate a (finite) algorithm for deciding whether a given positive integer 
is a prime. 

2. Formulate an algorithm for computing ν 2 to an arbitrary number of 
decimal places. (The algorithm is necessarily infinite. Why ἢ 


1.2 A Glance at Mathematical History 


Confronted with the definition of a constructive method given above, an 
unspoiled student is likely to ask: Is not all mathematics constructive in 
the indicated sense? 

There indeed was a time when most of the work done in mathematics 
was not only inspired by concrete questions and problems but also aimed 
directly at solving these problems in a constructive manner. This was the 
period of those classical triumphs of mathematics that fill the layman with 
awe to the present day: The prediction of celestial phenomena such as 
eclipses of the sun or the moon, or the accurate prophecy of the appearance 
ofacomet. Those predictions were possible, because the solutions of the 
underlying mathematical problem were not merely shown to exist, but 
were actually found by constructive methods. 


what is numerical analysis? 5 


The high point of this classical algorithmic age was perhaps reached in 
the work of Leonhard Euler (born 1707 in Basel, died 1781 in St. Peters- 
burg). Euler was possessed of a faith in the all-embracing power of 
mathematics which today appears almost naive. At the age of twenty, 
before he had ever seen the ocean, he won a competition of the Paris 
Academy of Sciences with an essay on the best way to distribute masts on 
sailing vessels. Innumerable numerical examples are dispersed in the 
(so far) seventy volumes of his collected works, showing that Euler always 
kept foremost in his mind the immediate numerical use of his formulas 
and algorithms. His infinite algorithms frequently appear in the form of 
series expansions. 

After Euler’s time, however, the faith in the numerical usefulness of an 
algorithm appears to have decreased slowly but steadily. While the 
problems subjected to mathematical investigation increased in scope and 
generality, mathematicians became interested in questions of the existence 
of their solutions rather than in their construction. It is true that up to 
1900 most existence theorems were proved by methods which we would 
call constructive; however, the computational demands made by these 
methods were such as to render absurd the idea of actually carrying 
through the construction. It is hardly conceivable, for instance, that 
Emile Picard (1856-1941) ever thought of going through the motions 
required by his iteration method for solving, say, a nonlinear partial 
differential equation. 

In view of the feeling of algorithmic impotence which must have per- 
vaded the mathematical climate at that time, it is easily understandable 
that mathematicians were increasingly inclined to use purely logical rather 
than constructive methods in their proofs. An early significant instance 
of this is the Bolzano-Weierstrass theorem (see Buck [1959], p. 10) where 
we are required infinitely many times to decide whether a given set 15 
finite or not. To make this decision even once is, in general, not possible 
by any constructive method. During the second half of the 19th century 
the nonconstructive, purely logical trend in mathematics was rapidly 
picking up momentum. Some main stations of this development are 
marked by the names of Dedekind (1831-1916), Cantor (1845-1918), 
Zermelo (1871-1953). In spite of some countertrends inspired by out- 
standing mathematicians such as Hermann Weyl (1885-1955), the logical 
point of view appeared to be steering towards an almost absolute victory 
near the end of the first half of the 20th century. Mathematics finally 
seemed at the threshold of making true the proud statement of Jacobi 
(1804-1851) that: ‘“‘ Mathematics serves but the honor of the human spirit.” 

By a strange coincidence, algorithmic mathematics has been liberated 
from the vincula of numerical drudgery at the very moment when pure 


6 elements of numerical analysis 


mathematics finally seems to have liberated itself from the last ties of 
algorithmic thought. From the beginnings of the art of computation to 
the early 1940’s the speed of computation had been increased by a factor 
of about ten, due to the invention of various computing devices which 
today seem primitive. Since the early 1940’s the speed of computation 
has increased a millionfold due to the invention of the electronic digital 
computer. To put into effect even the most complicated algorithms 
presents no difficulty today. Asa consequence, the demand for algorithms 
of all kinds has increased enormously. 


Problem 


3. Look up Dedekind’s axiom concerning the so-called Dedekind sections 
(see Taylor [1959], p. 447) and give three examples of its application in 
elementary calculus. 


1.3 Polynomial Equations: An Illustration 


To illustrate the types of problems and concepts we wish to deal with, 
we will consider the problem of solving polynomial equations. We are all 
familiar with the quadratic equation 


(1-1) x2 + a,x + ag = 0, 


where a, and a, denote arbitrary real numbers. It is well known that the 
solutions of this equation are given by the formula 


(1-2) x = —ha, + [Gas)? — a2)”; 


in certain cases the computation of the square root leads to complex 
numbers (see chapter 2). Formula (1-2) states more than the mere 
existence of a solution of equation (1-1); in fact it indicates an algorithm 
which permits us to calculate the solution, More precisely, the problem 
of computing the solution is reduced to the simpler problem of computing 
a square root. If an instrument for computing roots is available (this 
could be a table of logarithms, a computing machine especially pro- 
grammed for the purpose, oT the reader of this book armed with the 
algorithm he developed when solving problem 2), then any quadratic 
equation can be solved. Is the same also true for equations of higher 
degree? 
Let there be given an arbitrary polynomial of degree ἢ, 


(1-3) p(x) = τ ρον He + An; 


the problem is to find the solutions of the equation p(x) = 0. In the 


what is numerical analysis? 7 


golden age of algorithmic mathematics Scipione dal Ferro (1496-1525) 

and Rafaello Bombelli (L’Algebra, 1579) discovered that in the cases 

n = 3andn = 4 there exist algorithms for finding all solutions of equation 

(1-3) that merely require an instrument for computing roots. Through 
several centuries attempts were made to solve equations of higher than the | Ἢ 
fourth degree in a similar manner, but all these efforts remained fruitless. 
Finally Galois (1811-1832) proved, in a paper written on the eve of his | 
premature death in a duel, that it is not possible in the case n > 4 to | 
compute the solutions of equation (1-3) with an instrument that merely | 
calculates roots. This, then, is a typical instance of nonexistence of a | 
certain type of algorithm. Modern numerical analysis, too, knows of 
such instances of nonexistence of algorithms. 

If a problem cannot be solved with an algorithm of a certain type, this 
does not mean that the problem cannot be solved at all. The problem is 
merely that of discovering a new algorithm. Today there are no practical 
limits to mathematical inventiveness in the discovery of new algorithms. 
It must be admitted, though, that the talent for discovering algorithms Is 
in no way confined to mathematicians. Some of the most effective 
algorithms of numerical analysis have been discovered by aerodynamicists 
such as L. Bairstow (1880- ); by astronomers such as P. L. Seidel 
(1821-1896); by the meteorologist L. F. Richardson (1881-1953); and by 
the statistician A. C. Aitken (1895- ). 

The problem of finding the solutions of a polynomial equation of 
arbitrary degree has attracted mathematicians of many generations such 
as Newton (1643-1727), Bernoulli (1700-1782), Fourier (1768-1830), 
Laguerre (1834-1886), and even today significant contributions io the 
problem are made almost every year. One of the simplest algorithms for 
solving equation (1-3) is due to Daniel Bernoulli. If a sequence of 
numbers Z;, Zo, Zs,... is determined by setting z, = Ὁ for k < 0, Z) = Ι, 
and calculating 2, for k > 0 by means of the recurrence relation 


(1-4) Zi = —@,2Z,-1 ᾿Ξ: gZ),9 re bai x — Toop ἢ (kc _— I, 2 ath "ἢ 


then the sequence of quotients 


(1-5) qe = ΞΞΞΞ 

2k 
—if certain hypotheses are fulfilled—tends to a solution of equation (1-3) 
(see chapter 7). Here we have a typical example of an infinite algorithm, 
for the sequence {z,} never terminates. The recursive nature of Ber- 
noulli’s process—the same formula (1-4) is evaluated over and over 
again—also is typical of many processes of modern numerical analysis. 


8 elements of numerical analysis 
1.4 How to Describe an Algorithm 


An algorithm is not well defined unless there can be no ambiguity 
whatsoever about the operation to be performed next. Obvious causes 
for the breakdown of an algorithm are divisions by zero, or (in the real 
domain) square roots of negative numbers. More generally, an algorithm 
will always break down if a function is to be evaluated at a point where it 1s 
undefined. All such occurrences must be foreseen and avoided in the 
statement of the algorithm. 


EXAMPLES 
3. Let an algorithm be defined as follows: Choose zp arbitrarily in the 
interval (0, 2), and compute Ζ1, Z2,... by the recurrence relation 


I 


ἥν ee 
k+1 7 — δὰ 


This algorithm is not well defined, since the formula does not make sense 
when z, = 2, which happens to be the case for instance, if zg = ἃ and 
k = 3. The algorithm can be turned into a well-defined one by a state- 
ment such as the following: ““ Whenever z, = 2, set z,., = 13.” 

4, Bernoulli’s algorithm described in §1.3 is well defined to the extent 
that the right-hand side of equation (1-4) always has a meaning and the 
sequence {z,,} thus always exists. However, it still could be the case that 
infinitely many z,’s are zero. The corresponding elements g,, would then 
be undefined. (Consider, for example, the polynomial p(z) = z* — 1.) 
Under suitable hypotheses on the polynomial (1-3) it can be shown, 
however, that the quotients g,, are always defined for & sufficiently large 
(see chapter 7). 


Many simple algorithms can be perfectly well described by means of the 
conventional symbolism of algebra, supplemented, if necessary, by English 
sentences. In this manner we shall for instance describe most of the 
algorithms given in this book. Ordinary algebraic language is not the 
only language, however, in which an algorithm can be expressed. As a 
matter of fact, recent experience has shown that for very involved 
algorithms, especially those occurring in numerical linear algebra, the 
traditional mathematical language is sometimes grossly inadequate. 

Ordinary mathematical language has one further disadvantage: It 
cannot be understood directly by computing machines without first being 
““coded.”’ In the interest of breaking down the communication barrier 
between man and machine as well as between man and man, it becomes 
very desirable to describe algorithms in a language that can be understood 
by both. 


what is numerical analysis? 9 


An algorithmic language with this property, called FORTRAN 
(= formula translator), was created around 1957 by the International 
Business Machines Corporation (IBM). It is very widely used not only 
on IBM machines, but on other machines as well. FORTRAN is a 
completely adequate tool (especially in view of some recent refinements) 
for describing and communicating to the machine the vast majority of 
algorithms currently used in computational practice. It is strongly 
suggested that the student of this book familiarize himself with FORTRAN 
and use it to solve the computational problems posed in subsequent 
chapters. Excellent introductions to FORTRAN are available 
(McCracken [1961)). 

FORTRAN was designed primarily as a practical research tool. Its 
great advantages are its simplicity and its wide circulation. From a 
theoretical point of view, though, the FORTRAN language has some 
serious shortcomings, and some of its rules appear highly artificial. 
Realizing this, an international team of computer scientists gathered in 
1957 with the aim of creating a universal algorithmic language that would 
be satisfactory from a theoretical point of view and that would not suffer 
from any artificial limitations. The result of their efforts was the language 
known as ALGOL (= algorithmic /anguage). A revised version called 
ALGOL 60 has found wide acceptance, especially in Europe. At the time 
this is being written, there are not yet many machines in the United 
States equipped to read ALGOL, although there are reasons to believe 
that this will change in the future. ALGOL is described in an official 
document of the Algol committee (Naur [1960]); there are also some less 
formal, but more readable, introductions to the language (McCracken 
[1962], Schwarz [1962)]). 


1.5 Convergence and Stability 


Once an algorithm is properly formulated, we wish to know the exact 
conditions under which the algorithm yields the solution of the problem 
under consideration. If, as is most commonly the case, the algorithm 
results in the construction of a sequence of numbers, we wish to know the 
conditions for convergence of this sequence. The practitioner of the art 
of computation is frequently inclined to judge the performance of an 
algorithm in a purely pragmatic way: The algorithm has been tried out in 
a certain number of examples, and it has worked satisfactorily in 95 
per cent of all cases. 

Mathematicians tend to take a dim view of this type of scientific 
investigation (although it is basically the standard method of research in 
such vital disciplines as medicine and biology). It is indeed always 


10 elements of numerical analysis 


desirable to base a statement about the performance of a given algorithm 
on logic rather than empirical evidence. Such logically provable state- 
ments are called theorems in mathematics. As we shall see, theorems 
about the convergence of algorithms can be stated in a good many Cases. 


EXAMPLE 


5, The following is a necessary and sufficient condition for the conver- 
gence of Bernoulli’s method mentioned in §1.3: Among all zeros of 
maximum modulus of the polynomial (1-3) there exists exactly one zero of 
maximum multiplicity (see chapter 2 for an explanation of these con- 
cepts). If this condition is fulfilled, the sequence {q,} converges to that 
zero of maximum modulus and multiplicity. 


Once the question of convergence is settled, numerous other questions 
can be asked about the performance of an algorithm. One might want 
to know, for instance, how fast the algorithm converges. Or one may 
wish to know something about the size of the error, if the algorithm is 
artificially terminated after a finite number of steps. The latter question 
can be interpreted in two ways: Either one wishes to know how big the 
error is at most, or how big the error is approximately. The answer to the 
first question is given by an error bound; the answer to the second by an 
asymptotic formula. Mathematicians, who like to think in categories, 
usually give preference to error bounds. However, an error bound which 
exceeds the true error by a factor 105 is practically useless, whereas an 
approximate formula, while not representing a guaranteed bound, still 
can be very useful from a practical point of view. (Our scientists, if they 
depended on guaranteed error bounds, would never have dared to put a 
manned satellite in orbit.) Last but not least, the study of the asymptotic 
behavior frequently reveals information which enables one to speed up 
the convergence of the algorithm. Examples of this will occur in almost 
every chapter of this volume. 

Even the theoretical convergence of an algorithm does not always 
guarantee that it is practically useful. One more requirement must be 
met: The algorithm must be numerically stable. In high school we 
learned how to express the result of multiplying two six-digit decimal 
fractions (which in general has twelve digits) approximately in terms of a 
six-digit fraction by a process known as rounding. Although the resulting 
fraction is not the exact value of the product, we readily accept the minor 
inaccuracy in view of the greater manageability of the result. If several 
multiplications are to be performed in a row, rounding becomes a practical 
necessity, as it is impossible to handle an ever-increasing number of 
decimal places. In view of the fact that the individual errors due to 


what is numerical analysis? 11 


rounding are small, we usually assume that the accuracy of the final result 
is not seriously affected by the individual rounding errors. 

Modern electronic digital computers, too, work with a limited number 
of decimal places. The number of arithmetic operations that can be 
performed per unit time, however, is about a million times as large as 
that performed in manual computation. Although the individual round- 
ing errors are still small, their cumulative effect can, in view of the large 
number of arithmetic operations performed, grow very rapidly like a 
cancer, and completely invalidate the final result. In order to be sound, 
an algorithm must remain immune to the accumulation of rounding errors. 
This immunity is called numerical stability. 

The concept of numerical stability can be very well illustrated by means 
of Bernoulli’s algorithm discussed in §1.3. As mentioned above, this 
algorithm will, under certain conditions, yield the zero of maximum 
modulus of the polynomial (1-3). This algorithm is stable in the sense 
that it furnishes the zero to the same number of significant digits as are 
carried in the computation of elements of the sequence {z,}. We shall 
have the opportunity to discuss the following extension of the Bernoulli 
method: If we form the quantities 


, _ Geo — Wk 


ὼ Gu — ἔπ -- 


then under certain hypotheses (which can be specified) the sequence {q,,} 
tends to the zero of next smaller modulus of the polynomial p (see chapter 
8). We have thus again an instance of a convergent algorithm or, to put 
it differently, of a logically impeccable mathematical theorem. In spite of 
all this the algorithm just formulated is practically useless. The quantities 
41. are affected by rounding errors to an extent which completely spoils 
convergence. 

In spite of the above example, the reader should not form the impression 
that stability is an invariant property which an algorithm either has or 
does not have. In the last analysis, the stability of an algorithm depends 
on the computing machine with which it is performed. The modification 
of Bernoulli’s algorithm described above is unstable if performed on an 
ordinary computing machine but it could be rendered stable on a machine 
equipped to compute with a variable number of decimal places. In a 
similar way it is possible to increase the stability of certain algorithms in 
numerical linear algebra by performing certain crucial operations with 
increased precision. 

Rather than trying to set up absolute standards of stability, it should 
be the aim of a theory of numerical stability to predict in a quantitative 
manner the extent of the influence of the accumulation of rounding errors 


12 elements of numerical analysis 


if certain hypotheses are made on the individual rounding errors. Sucha 
theory will be descriptive rather than categorical. Ideally, it will predict 
the outcome of a numerical experiment, much as physical theories predict 
the outcome of physical experiments. A particularly useful model of 
error propagation is obtained if the individual errors are treated as if they 
were statistically independent random variables. In chapter 16 we will 
show how this model can be applied to a large variety of algorithms 
discussed elsewhere in this volume. 


chapter 2 complex numbers and polynomials 


One of the recurring topics in the following chapters will be the solution of 


polynomial equations. Complex numbers are an indispensable tool for 
any serious work on this problem. 


2.1 Algebraic Definition 


The reader is undoubtedly aware of the fact that within the realm of 
real numbers the operations of addition, subtraction, multiplication, and 
division (except division by zero) can be carried out without restriction. 
However, the same is not true for the extraction of square roots. The 
equation 


(2-1) eg 


is not always soluble. Ifa < 0, there is no real number x which satisfies 
equation (2-1), because the square of any real number is always 
nonnegative. 

Historically, complex numbers originated out of the desire to make 
equation (2-1) always solvable. This was achieved by simply postulating 
the existence of a solution also for negative values of a. Actually, only 
the solution of one special equation, namely the equation 


(2-2) x7 = -1 


has to be postulated. Following Euler, we denote a solution of this 
equation by i. The symbol i is called imaginary unit; it is a “number” 
satisfying 
ἢ τ —1. 
Postulating further that i can be treated as an ordinary number, we can 


now solve any equation (2-1) with a < 0. 
13 


14 elements of numerical analysis 


EXAMPLE 

1. A solution of x? = —25 is given by x = i5, for 
G5)? = [555 = (—1)25 =.—25. 

Another solution is x = —i5, for 


(—i5)? = (—1)9i25? = 1(—1)25 = —25. 


Not only special equations of the form (2-1), but any quadratic equation 
of the form (2-3) 


(2-3) x? + 2bx +c =0 


with real coefficients ἢ and c can now be solved. As is well known, the 
method consists in writing the term on the left in the form of a complete 
square plus correction term: 


x2 + 2bx +e= (x + b+. — 85. 
Equation (2-3) demands that 
(x + b)? = b? -- α.ὔ 


For b2 — ¢ = 0 we obtain the solutions familiar from elementary algebra. 
If 62 — c < 0, then c — 6? > 0, and our equation is equivalent to 


(x + δὴ} = --(ς — 57). 
According to the above, it has the “solutions” 
x+b= tive -- δ 


or 
x= —b + ive — B’. 


EXAMPLE 
2. The equation x? + 6x + 25 = 0 is equivalent to 


(x + 3)? + 25-9 =0 
or 
(x + 3)? = —16. 


It therefore has the solutions 
x= —3 + i4. 


Any expression of the form a + ib, where a and b are real, is called a 
complex number and may henceforth be denoted by a single symbol such 


complex numbers and polynomials 15 


asz. If z= a+ ib, we call a the real part of z and b the imaginary part 
of z. In symbols this relationship is expressed as follows: 


a= Rez, δ = Imz., 


Two complex numbers are considered equal if and only if both their real 
and their imaginary parts are equal. 

What has been gained by the introduction of complex numbers besides 
the ability of solving all quadratic equations in what at the moment may 
appear as a rather formalistic manner? Certainly nothing has been lost, 
because all real numbers are contained in the set of complex numbers. 
They are simply the complex numbers with imaginary part zero. Also, 
nothing has been spoiled, because we shall see that the ordinary rules of 
arithmetic still hold for complex numbers. 

To justify this statement, let 


Z,=a+ib 
Zg = c+ id 


be any two complex numbers. We define 


(2-4) ee ee μρήισὶ Ὁ νῆα, 


σι — Ζ,α τ a+ ἰδ -- (c+ id) =a—c+ i(b -- a). 


Two complex numbers are added (subtracted) by adding (subtracting) 
their real and imaginary parts. The role of the “neutral element,” 1.e., 
of the number w satisfying z + w = z for all z, is played by the complex 
numbert 

0 - 0 - 70. 


Multiplying two complex numbers formally, we obtain 
Z1Zo = (a + ib)(c + id) = ac + i(ad + be) + i*bd. 
But i? = —1. We thus define: 
(2-5) 21Z_ = ac — bd + i(ad + be). 


For the special case of real numbers (6 = ὦ = 0) the above definitions 
reduce to the ordinary sum and product of real numbers. Moreover, the 
following rules, familiar from ordinary arithmetic, still hold for any 
complex numbers Ζ1. Ze, Z3: 


+ Two different kinds of zeros are used in this equation. On the right, we have twice 
the real number zero. On the left, we have the complex number zero, whose real 
and imaginary parts are the real number zero. As no misunderstandings can arise, 
one does not try to make a graphical distinction between the two zeros. 


16 elements of numerical analysis 


The commutative laws 
21] + 29 Ἐπ £9 Ἔ its £1£9 Ἐπ 29213 
the associative laws 
(Z; + Zo) + 2g = 2] + (Zo + 23), Z,(ZoZ3) -- (Z1Z0)Za; 
and the distributive law 
Ζχίζα + 23) = 2122 + 212s. 

The proof of these relations is an immediate consequence of our definitions 
and of the corresponding rules for real numbers. 

We have yet to define the quotient ze/z, of two complex numbers. The 
real case shows that some restriction has to be imposed here: The case 
z, = 0 will have to be excluded. Is it possible, however, to define z./z, 
for all z, #0? To answer this question, we recall the meaning of the 
quotient. Inthe real case we mean by x = b/a the (unique) solution of the 
equation ax = b. Analogously, we understand by z = z,/z, the solution 
of 


fin = δι; 
If z= x + iy, this amounts to 

(a + ib)\(x + iy) = ὁ + id, 
or, after separating real and imaginary parts, 


ax — by 
bx + ay 


Cc 
d 


(2-6) 


The two relations (2-6) can be regarded as a system of two linear equations 
for the two unknowns x and y. This system has a unique solution for 
arbitrary values of ¢ and d if and only if its determinant is different from 
zero. The determinant is 


a—b 
Bi viet 


it will be different from zero if and only if at least one of the numbers a 
and b is different from zero, that is, if z, # 0. Ifz, #4 0, the solution of 
equation (2-6) is easily found to be 


ac + bd  _ ae -- δὲ 
a + b2 ὙΠ ΡΠ 


= se δ: 


+ = 


We thus define 


Zo ac+ bd ,ad - δα. 
a? nhs δ" a® + 3 


(2-7) - 


21 


complex numbers and polynomials 17 


The same result could have been obtained by formal manipulation, as 
follows: Let 


eid 


Zz, atib 


In the fraction on the right we multiply both numerator and denominator 
by a — ib. In the denominator we obtain 


(a + ib)(a — ib) = a? + b? + i(ba — ab) = αὐ + DB, 


and carrying out the multiplication in the numerator we again find 
equation (2-7). This procedure cannot replace the proof that z,z = Z, 
has the solution (2-7), because it assumes the existence of the solution; 
once existence has been established, however, it is very useful for the 
computation of the numerical value of z2/z,. 


EXAMPLES 
[8 kop πε ae eM tt db tee 


oF Mun ChE att) ἡ 16 “ἢ. Ἢ 


3, 


4, The reciprocal z = 1/(a + ib) of a complex number a + ib # Ὁ 15 
calculated as follows: 


] a= 1 au δ a ar 


Ξ (aT 


a+b (@+ibja—-ibh) @#+P ἀ1 δ 


5. To express 


in the form a+ ib. Successive applications of the above method of 
reduction yield 


l a l ] 1 
1 i μὴ ee te 
i -| ee ἘΞ. τ πες 
ἮΝ 1 ἐς Ret. i+ 1 2 
i+ 1 Fs 
Be: d Ἐπ Ἢ 
Te Pde Ee Ὁ 


18 elements of numerical analysis 
Problems 


1. Express in the form a + ib 


- 7 3\2 
@ (4): 
1 8 ἢ)"; 
(c) (cos φ + isin @)*; 
I 
cos m — ising’ 


1 1 
OTT >, 1 
7+ δὶ 


(d) 


2. What are the values of z for which the complex number 


1 1 
ΤΕΣ ΤῈ ἢ: 
is undefined ? 
3. Let s=1+2+ 22 +-+++ χη where z is a complex number, z # 1. 
Find a closed formula for s by forming the expression s — zs. 


2.2 Geometrical Interpretation 


The discussion of §2.1 will have satisfied the student that complex 
numbers can be manipulated exactly like real numbers. At the moment 
this result may seem a mere formal curiosity. However, the significance 
of complex numbers in analysis is based on the fact that the algebraic 
operations discussed in §2.1 admit simple and beautiful geometric 
interpretations. 

A complex number z = a + ib can be represented geometrically in 
either of two ways: 

(ἢ We can associate with z the point with coordinates (a, δ) in the 
(x, y)-plane. If the points of an (x, y)-plane are thought of as complex 
numbers, that plane is called complex plane. Its x-axis is the locus of 
complex numbers that are real; it is therefore called the real axis. Its 
y-axis carries the points with real part zero; it is called the imaginary axis 
of the complex plane. 

(ii) We can associate with a complex number z the two-dimensional 
vector with components aand 6. We think of this vector as a free vector, 
i.e., a vector that may be shifted around arbitrarily as long as its direction 
remains unchanged. 


complex numbers and polynomials 19 


Both interpretations can be combined into one by attaching the vector z 
to the origin of the plane. It then becomes the radius vector pointing 
from the origin to the point z. 


EXAMPLE 


6. To the four complex numbers z = 1, z = i, z = —1, z = —i there 
correspond points (Fig. 2.2c) or radius vectors (Fig. 2.2d). 


If complex numbers are interpreted as vectors, addition of two complex 
numbers amounts to ordinary addition of the corresponding vectors. 
For indeed, the sum of two vectors is obtained by forming the sum of 
corresponding components, just as in the case of complex numbers 
(see Fig. 2.26). 

The difference z; — ΖΩ of two complex numbers can be interpreted as 
the difference of the corresponding vectors. As is well known, the 
difference of two vectors z, and z, can be defined by adding to z, the vector 
—Z», 1.e., the vector which has the same length as zy, but direction opposite 
to z. If we attach both vectors z, and z, to the same point, then the 
difference z; — z, corresponds to the vector pointing from the head of z, 
to the head of z, (see Fig. 2.2f). 

Multiplication of a complex number by a real number can be similarly 
interpreted. Let ς be a real number, and let z=a+ ib. Writing 
ec = c + i0, we obtain from the multiplication rule 


cz = (ec + ἰθ)ία + ib) = ca + ich. 


The vector cz thus has the same direction as z, if c > 0, and it has the 
opposite direction, ifc < 0. Ifc = 0, cz is the zero vector. The length 
of ez is always |c| times the length of z. 


᾿ 


Figure 2.2 


20 elements of numerical analysis 


complex numbers and polynomials 21 


The question of how to interpret the product of two arbitrary complex 
numbers now arises. In order to arrive at a suitable interpretation, we 
require the notions of the absolute value and of the argument of a complex 
number. 

The absolute value or modulus of a complex number z = a + ἰδ 15 
denoted by |z| and is defined as the length of the vector representing z, or 
equivalently, as the distance from the origin of the point in the complex 
plane representing z. It follows from Pythagoras’ theorem that 


(2-8) \z| = Va? + δ3. 


The absolute value of a complex number z is zero if and only if z = 0. 
Otherwise the absolute value is positive. 

The argument of a complex number is defined only when z # Ὁ. It Is 
denoted by arg z and denotes any angle subtended by the vector z and the 
positive real axis. The angle is counted positively if a counterclockwise 
rotation of the positive real axis would be required to make its direction 
coincide with the direction of the vector z (see Fig. 2.2g). 

The argument of a complex number is not uniquely determined. If » 
is a value of arg z, then every angle φ + 27k (k an arbitrary integer) is also 
a possible value of argz. Thus, the argument of a complex number 15 
determined only? up to multiples of 27. 

To compute an argument » of z = a + ib # Ὁ we proceed as follows. 
Letting |z| = r, we have from figure 2.2g 


(2-9) a=rcos®, 5=rsin 9. 


We can use the first of these equations to determine the absolute value of p 
by means of the relations 


cos p = = lp| S π. 


If b # 0, the sign of ῳφ is the same as the sign of ῥ, as follows from the 
relations : 


sign m = sign (sin φ) = sign 2 = sign Bb, 


since r > 0. If 6=0, then a=r or a= —r. In the former case, 
g = 0. In the latter case,» = Ἐπ. Since 7 and --π differ by 27, both 
values are equally admissible values of arg z. 


+ The more advanced development of the theory of complex numbers shows that it 
would be unwise to restrict the argument artificially to an interval of length 27 by 
stipulating a condition such as —7 < ge Sawor0 S g < 27. 


22 elements of numerical analysis 


EXAMPLES 
7. Letz=4+i3. From (2-8) 


[7] = V4? + 3? = 725 = 5. 
To compute y = arg z, we first use the relation 
cos p = ξ to find |p| = 36° 52’. 


Since Im z = 3 > 0, it follows that φ = 36° 52’. 

8. Forz = —4 — i3 we likewise find [2] = 5, but in view of cos p = —# 
we now have |p| = 143° 8’ and, since Imz = —3 < 0,p = —143°8. 

9, The absolute value of a complex number with imaginary part zero 
coincides with the absolute value (as defined for real numbers) of its real 
part. Its argument is Ὁ (or 27k), if the real part is positive, and 7 
(or + + 27k), if the real part is negative. 


Relation (2-9) shows that every complex number z # Ὁ can be repre- 
sented in a unique manner as the product of a real, positive number and of 
a complex number with absolute value 1. This representation is given by 


(2-10) z = r(cos + isin φ), 


where r = |z|, and where p denotes any value of argz. The representa- 
tion (2-10) is called the polar representation or polar form of the complex 
number Ζ. 

The polar representation enables us to give a geometric interpretation 
of the product and the quotient of two complex numbers. Let 


Z, = r,(cos φι + 7 sin φ!) 
Zq = ro(COS m, + 1SiN Pa) 


be any two nonzero complex numbers. Multiplying by the algebraic 
rules of §2.1 we find 


Ζ1Ζ. = ΤΊ ΟῸ5 φι COS Po — SIN Py SIN Pe 
+ i(cos φι Sin pg + SiN φῃ COS Pa)] 


or, by virtue of the addition theorems of the trigonometric functions, 
Ζιζο = ViFa[cos (p, + Yo) + isin (φι + %2))- 


This, however, is the polar representation of the complex number with 
absolute value Κι}. and argument φι + y2. We thus have proved the 
relations 


(2-11) l2;Z| = |23| Ζα]. arg (2125) = arg z, + arg Zo. 


complex numbers and polynomials 23 


The second of these is to be interpreted in the following sense: The sum 
of any two admissible values of arg z, and arg z, yields an admissible 
value of arg (ZZ). 


EXAMPLES 

10. Let z. be a positive real number. We then have |z,| = Zo, gp, = 0. 
Multiplication of z, by z2 amounts to a dilatation or contraction of the 
vector z,;, according to whether z. > 1 or O < z, < 1. 

11. If z, is a complex number of absolute value 1, multiplication by z, 
amounts to a counterclockwise rotation by argz,. Special case: 
Multiplication by i amounts to a rotation by 7/2. 


It has already been noted that for real numbers complex multiplication 
reduces to ordinary real multiplication. Inasmuch the argument of a 
negative real number may be taken as +7, the rules (2-11) can be thought 
of as a generalization of the rule ““minus x minus = plus” of high school 
algebra. 

We now compute the quotient of two complex numbers given in polar 
form. Using (2-7) we find 


Zo Ῥχαίοοβ φῃ COS pe + SIN gp; SIN Go) 
Zy ri 
.yPo(COS P, SIN Pg — COS φῃ SIN φ!) 


+ ] 
ry 


Pe | ΠΝ | 
᾿ [cos (P2 we P1) + isin (Po — φ)]. 


The expression on the right is the polar form of the complex number with 
absolute value r,/r, and argument ~, — @,. We thus have the formulae 


z |Zo| Zo 
2-12 53) = 2, arg (22) = Aro Ζὰ — are 2}. 
( ) 24 [Ζῳ] ΞΕ zi ΞΕ =2 Ξ 21 
An important special case arises when ΖΩ = 1. ὅ851π08}]} = l,arg1 = 0 
we then have 
} i Ϊ 
2-13 =| = ,} (2) = -- 3 
(2-13) 2 Ξ arg {= arg z 


The construction of the reciprocal 1/z for a given z thus involves two 
interchangeable steps: (i) Multiply z by 1/|z|?; (di) Reverse the sign of the 
argument of the number thus obtained. The second of these steps— 
reversal of the sign of the argument—geometrically amounts to a reflection 
of the number on the real axis. As this operation of reflection occurs 
frequently in other contexts, too, a special notation has been devised for it. 


24 elements of numerical analysis 


If z = a + ib is any complex number, reflection on the real axis yields the 
number a — ib. This number is called the complex conjugate of z and 
is usually denoted by Z. Evidently, 


(2-14) [ei = 15}. arg ΞΖ = —arg z. 
The real and imaginary parts of a complex number can be expressed in 
terms of a complex number and its conjugate. By adding and subtracting 


the relations z = α + ib, 7 = a — ib we obtain 


z+ z= 2a, z— z= 2ib, 


hence 

z+2z =| Rye tat Ζ 
(2-15) a= Rez= + ὁ -- [ΠΊ2 -Ξ 5; 
From 


zz = (a + ib)(a — ib) = a® + Bb? = |z|? 
there follows the relation 
(2-16) [2 = Vzz. 


The following rules of computation are easily verified: 


Z1 + Zg = 21 Ἴ 2 


(2-17) 


£21429 = £149: 


These rules say, in effect, that in forming the complex conjugate of a sum 
or a product it is immaterial whether we take the complex conjugate of the 
result of the algebraic operation, or whether we perform the algebraic 
operation on the complex conjugate quantities. In the case of the 
reciprocal we have 

i git 

= (3) 


We shall use the rules just established to give a purely analytical proof 
of the so-called triangle inequalities. These inequalities say that in an 
arbitrary triangle the length of one edge is at most equal to the sum, and 
at least equal to the difference, of the lengths of the two other sides. If 
we represent the sides of the triangle by the vectors z,, Z2, and z; + Ζ8 
(see Fig. 2.2e), the triangle inequalities are given by 


(2-18) {{2.| — |zel| S |z1 + Z| S [Ζε| + [26]. 


complex numbers and polynomials 25 


To prove this analytically, we note that by (2-16) and (2-17) 


(2-19) [Ζ + Z|? = (21 + Za)(Z1 + 22) 
= (2; + Z2)(2Z1 + Z2) 
=> 2321 + Z1£9 Ἔ Ζ5ΞῚ +- ZoZe 


= [2.3 + Ζιι2Ζ2 + 292, + 2.5, 


In view of 2,2. = Ζ,35 = Ζιξὰ we have by (2-15) 
Z1Z9 + Ζ)2ὰ = Z1Zq + Ζ,Ξ. = 2 Re 2,2. 
Now for an arbitrary complex number z, 
[Re z| Ξ [“|. 
Setting z = z,Z,, we thus have by (2-11) and (2-14) 
[Re 2120| S [212] = |21||Z2| = |21||2el. 
Thus there follows from (2-19) 


[23° zgl* Ξ lal? + |Z2|” + 212} [Ζα] 
(|Z1| + |Z2| )? 


and hence by taking square roots, 


ll Wl 


|Z1 + Z| S [21] + [Ζοἱ. 


On the other hand, since Rez = —|z 


, we also have 


21 Ἔ 2.5 = \z;|? + |Z2|? = 2|Ζ.} |Z2| 
= (|2;| — |z2|)? 


and hence, by taking square roots, 
lar + al  ||za| — lzall. 
This completes the proof of (2-18). 
Problems 
4. Compute the absolute value and all arguments of the complex numbers 


brit 
he 


(a) —22, (Ὁ) —6 + £8, (c) 


5. Let w = —4 + i(V3/2). Show algebraically that w? = 1, What do 
you conclude about argw? What about |w|? Verify your conclusions 
by calculating [ὦ and arg ὦ. 


26 elements of numerical analysis 


6. Show that the arguments @ of the complex number z = a + ib (a τὸ 0) 
are given by 


p = arctan? + 2nk, if a > 0, 


p = are tan Ὁ + (2k + 1), if a < 0, 


where & denotes an arbitrary integer, and where arctan x denotes the 
principal value (lying between — 7/2 and 7/2) of the arcus tangens function. 
What is arg z if a = 0? 

Using induction, prove that for arbitrary complex numbers 2, Ze,..., Zn 


= 


[Ζὶ + Za +e++4+ Z| S [zi] + [ze] +--+ + [24]. 


8. Determine the equation of the locus of those points z = x + iy of the 
complex plane with the following property: The ratio of the distances of 
z from the points +1 and —1 has the constant value &. (Hint: The 
distances can be expressed in the form |z — 1) and |z + 1).) 

9, What is i" for arbitrary integral n? 


2.3 Powers and Roots 


Applying the multiplication formula (2-11) to a product of nm equal 
factors z = r(cos p + isin φ), we obtain 


(2-20) [r(cos p + isin φ)]" = r"(cos mp + sin ng). 
Setting r = 1, there results Moivre’s formula 
(2-21) (cos m + ising)” = cosnmp + sin ng. 


We expand the expression on the left by the binomial theorem and 
observe that i? = —1,i* = —i,i* = 1,.... Equating real and imaginary 
parts on both sides of (2-21), we obtain 


cos" @ — (5) cos"~* m sin® mp + (1) cos"-* pm sin* pm —--- = cos ng, 
[ἢ cos" τῳ sing -- (3) cos" ° »sin® mp +--+: = sin ng. 


Here the symbol ( : denotes the binomial coefficient defined by 


A η! 
() - ae t=O 1. «+5 π. 


complex numbers and polynomials 27 


EXAMPLE 
12. Foran = 3 we obtain 
3, 3 nm 3 ims 
cos 3p = cos’ p — 7) COS φ sin” p = Cos gp — 3 cos ¢@ Sin* 9, 
: 3 γ 10. iy Sipe μον. = 
sin 3p = 1) cos’ φ singe — [4] sin φ = 3 cos* p sing — sin’ g. 


We are now ready to tackle the problem of computing the nth roots of a 
complex number. In the real domain we call nth root of a real number a 
any number x satisfying the equation 


yee ἢ, 


Analogously we shall call mth root of the complex number w any number 
z= x + iy satisfying the equation 


(2-22) z= Ww, 


If w = 0 this clearly has the only solution z = 0. In order to determine 
all possible solutions z for w # 0, we write 


w = p(cos « + / sin a) 
and seek to determine the polar form z = r(cosm + ising). By (2-20), 
Ζ = r"(cos np + isin ng), 
and condition (2-22) assumes the form 
(2-23) r(cos np + isin np) = p(cos « + isin «). 


This condition evidently will be satisfied if 


and 


Thus the number 


nf oe. . ΒΕ 
Z= Vp (cos 5 + 72sin <) 
n n 

certainly is an mth root of the complex number p(cos « + jsina). Isit the 
only one? The absolute value of a nonzero complex number being 
Positive, ’ p evidently is the only possible absolute value of z. However, 
other values of the argument are possible. We recall that the argument of 
a complex number is determined only up to multiples of 27. Thus 
condition (2-23) is also satisfied if 

ng = «a + kn, 


if 


28 elements of numerical analysis 


where Καὶ is an arbitrary integer. We thus obtain an infinity of possible 
values of φ: 
Κ2π 


+ --, fei O54), Boy vase 
H 


=I 


P= 


Not all these values yield different solutions of (2-23), however. The 
same solution z is defined by two values of φ that differ merely by an 
integral multiple of 27. This will occur as soon as the corresponding 
values of k differ by an integral multiple of n. Different solutions of 
(2-23) thus are obtained only by selecting the values k = 0, 1,...,” — 1. 
We summarize this result as follows: 


Theorem 2.2 The equation z* = w, where w τέ 0 and nis a positive 
integer, has exactly ἢ solutions. They are given by 


z= r(cose + ising), 


/\w| and p = MEM Am, 


κΚτ-ῷ,1.2...., ἢ --ὶ. 


Geometrically, these solutions all lie on a circle of radius r. The 
argument of one solution is I/n times the argument (any argument) of w; 
the remaining solutions divide the circumference of the circle in n equal 
parts. 


EXAMPLES 
13. Tocompute all solutions of z* = i. Since |i| = 1, all solutions have 
absolute value one. In view of arg i = 7/2, one solution has the argument 
7/8; the remaining solutions divide the unit circle into four equal parts 
(see Fig. 2.3). 
14. In the real domain the number | has either one or two nth roots, 
according to whether n is odd or even. In the complex domain the 
number 1 = 1+ i0 must have ἡ nth roots. Since W1 = 1, all roots 
have absolute value 1. Since arg 1 = 0, the arguments of these n nth 
roots of unity are given by 2ak/n,k = 0,1,...,2 — 1. Special example: 
For n = 3 we obtain the following three third roots of unity: 

270 


270 = 
cos —— + 181Πη — = 


3 3 


ee pe 5 Ξ : 
St yng earn 


acee is Ode οὐδε. 
Oe, wears 2 


complex numbers and polynomials 29 


Figure 2.3 


For v even one root of unity (namely the one obtained for k = n/2) has the 
argument 

27 ni 

—= --- Ξξ TT, 


n 2 
and thus is real and negative, i.e., we obtain two real nth roots of unity. 
For ἡ odd the equation 


ken 
i 


is not satisfied for any integer k, thus there is no negative root of unity. 
The example shows how by introducing complex numbers cumbersome 
distinctions of special cases can be avoided or dealt with from a unified 
point of view. 


Problems 


10. Determine all solutions of the equation 


ΓΕΣ ΤΡ fess oS Ὁ, 
(Use problem 3.) 


= -- —— -ἑ π--ο-΄- = 
--- SS ses 
- -, ο.5-- -ῷ τσ ““ 


30 elements of numerical analysis 

11. Find closed expressions for the sums 
A, =1+rcos@p + r*?cos2p +---+ rr cos ng, 
By 


rsing + r?sin2p +++++ γῆ sin ng. 
(Hint: Let z = r(cos p + ising) and consider A, + iB,.) 
12. Determine all solutions of the equations 
a) zt=-4, (B+) =i, (©) 22 =3V3. 


13. Express the function cos 4p as a polynomial in cos φ. 
14. Express the two values of vx + iy in the form a + ib. (Hint: Square 
and compare real and imaginary parts.) 


2.4 The Complex Exponential Function 


Beginning with this section we shall study certain important functions of 
a complex variable. In the real domain, a function is defined if to every 
real number of a certain set of real numbers (called the domain of the 
function) there is made to correspond (for instance, by an algebraic 
formula) a number of another set (called the range of the function). A 
complex function is defined in the same manner, with the difference that 
now both domain and range of the function are, in general, sets of complex 
numbers. 

We begin by defining the exponential function for complex values of 
the variable. In the real domain the exponential function can be defined 
by the exponential series, 


| i 
e =] + + πΡ 


2! 


Replacing x by iy, where y is real, we obtain formally 


iy , (ἢ, yy) 
ee ee a ee 
Using the fact that i? = —1, i® = —i, i* = 1,..., this may be written 
je ee ae 
ev = 1 + il a 71 ἘΞ ae = 3 


> lll 


complex numbers and polynomials 31 


The two series on the right are recognized as the Maclaurin expansions of 
cos y and siny. Thus we obtain 


(2-24) οἷν = cosy + isin y. 


The above computations are purely formal in the sense that they lack 
analytic justification. (It has not been defined, for instance, what is 
meant by the sum of an infinite series with complex terms, nor do we know 
whether it is permissible to rearrange the terms in such a series.) How- 
ever, nothing keeps us from adopting equation (2-24) as the definition of 
the exponential function e* for purely imaginary values of z. It will soon 
be obvious that this definition is reasonable for a number of reasons. 

We notice, first of all, that the polar form of a complex number z + 0, 
where r = |z| and » = arg z, can be written thus: 


2 τα fe’, 


We also note the following properties of the function εἶ", which are 
analogous to properties of the real exponential function: 


(a) οἷ =cos0 + isin0=1+i0 = 1. 
For the one number which is real as well as purely imaginary, the new 
definition (2-24) is compatible with the real definition. 


(b) The real exponential function satisfies the so-called addition 
theorem 


for arbitrary real numbers x, and x2. In analogy, the following identity 
holds for arbitrary real y, and ye: 


(2-25) οἰνιρῖνα — gilv, Ἐν) 
The proof follows from the multiplication rule (2-11) in view of the fact 
that εἶν is a complex number with absolute value 1 and argument y. 

The following two properties of the complex exponential function have 


no direct analog in the realm of real numbers. 
(c) We have e?* = cos 27 + isin 27 and hence 


eon main l. 


(This formula connects several important numbers of analysis.) 
(d) It follows from (2-25) and (c) that 


ely + 2m) ran ety pant _ οἷ" 


for arbitrary y. This relation expresses the fact that the exponential 
function is periodic with period 2zi. 


| 


| 
| 
i 
| 
| 


32 elements of numerical analysis 


Above, we have defined e in terms of cos y and sin y. We now shall 
show how to express the trigonometric functions in terms of the exponential 
function. Replacing in (2-24) y by —y, we get 


(2-26) e-¥ — cos(—y) + isin(—y) = cosy — isin y. 
Adding and subtracting (2-24) and (2-26) and solving for cos y and sin y, 
we obtain Euler’s formulas 


εἶν + ev tv ev — eo 
(2-27) ΠΈΣ Sc aaenn, ΤΣΞΞΞ ay = πϑκοκτο 


EXAMPLE 

15, Moivre’s formulas enable us to express cos np and sin np in terms of 
powers of cos@ and sing. Here we are concerned with the converse 
problem: To express a power of cos ¢ or sin p as a linear combination of 
the functions cos ¢, cos 2, . . ., sin φ, sin 2¢, . We explain the method 
by considering the function cos* ». By Euler’s formula, 


| ei? a ei? 4 
(cos p)* = (——) 


or, using the binomial theorem, since (ery an, 


(cos φ)" = τί + 4e%? + 6 + ἀρ 3? 4. @- *°), 


Collecting the terms with the same absolute value of the exponents and 
applying Euler’s formula again, we find 


(cos p)* ἘΞ ~L(e*? εἰς ο΄ ἨοῚ ats 4(e7!? te aie a ΕἸ 
= 1 οοβ 4p + 1 οο8 2p + ἢ. 
There remains the problem of defining e* for an arbitrary complex 
number z = x + iy. We adopt the following definition: 
(2-28) erty — etel¥ = e*(cos y + isin y). 
With this definition, the addition theorem 


(2-29) e*1e72 = e71 + #2 


holds for arbitrary complex numbers z; = χὰ + 1): and Zp = Χὰ + iy. 


Proof: 

ev1e72 = e*1 ete!) οἷν (by (2-28)) 
et + καρ, τυ.) (by (2-25) 
e+ Ly +i(yy + Vg) (by (2-28)) 


= 2*1 ea 8 


ll 


complex numbers and polynomials 33 


As a consequence of the addition theorem we find that for any positive 
integer ἢ 


(e*) ἢ = git. 


furthermore, setting z,; = z and z. = —z and observing property (a) we 
obtain 
Pe τ ΞΡ ΞΞῚ, 


Consequently the relation 
ΘΟ. = = 


familiar for real values of z, holds true also for arbitrary complex values. 
It shows, among other things, that the complex exponential function 
never assumes the value zero. 


Problems 


15. Express sin* » cos* » in terms of the functions cos m, cos 2p,.... 
16. Prove that 


2n 
i 2" dp = 2m 
[°" (cosy dp = an Ltd 


17. The logarithm » = log x of a real, positive number x is defined as the 
unique solution of the equation 6" = x. How would you define the 
logarithm of a complex number w? Does your definition lead to a unique 
value of log w? What are the possible value(s) of log (— 1)? 

18. If ¢ is regarded as the time, the point z = z(t) = Re“ travels on a circle of 
radius R. What is the locus of the points 


(a) z= ae" + be" (a, ἢ real); 
(b) Ζ 


19. Prove that e? = οὖ for all complex z. 


Il 


cos 2f + icost (Ὁ = t =a)? 


2.5 Polynomials 


Let ap, @;, d2,...,@, be m+ 1 arbitrary complex numbers, dp # 0. 
The complex-valued function p defined by 


(2-30) P(Z) = Goz" + ayz"~1 - (025 3 +-+++ a, 


is called a polynomial of degree n. The constants do, ..., a, are called the 
coefficients of the polynomial. The number ay is called the Jeading 
coefficient. A polynomial is called rea/ if all its coefficients are real. 


34 elements of numerical analysis 
Any real or complex number z for which 
p(z) = 9 


is called a zero of the polynomial p. We know that a real polynomial 
does not necessarily have real zeros; it suffices to recall the case of poly- 
nomials of degree 2. In the complex case, however, the situation 1s 
different. 


Theorem 2.5a Every polynomial of degree n 2 1 has at least one 
zero. 


This is the so-called fundamental theorem of algebra, which was first 
stated and proved (in five different ways) by C. F. Gauss (1777-1855). 
The proofs of Gauss were either algebraically involved, or they used 
advanced methods. Today the modern tools of analysis make it possible 
to give a relatively simple proof that is accessible to any student of 
advanced calculus (see e.g., Landau [1951], p. 233). However, a presenta- 
tion of this proof would lead us too far astray from the main themes of 
this book and it is therefore omitted. 

Taking the fundamental theorem for granted, we now discuss several of 
its applications. 


Theorem 2.5b Let p be a polynomial of degree ἢ 2 1, and let 
p(z;:) = 0. Then there exists a polynomial p; of degree ἢ — | such 
that 
p(z) = @ — 21)pi(2) 
identically in z. 
Proof. Let the polynomial p be given by (2-30). In view of p(z,) = 0 we 
have 


p(z) = p(z) — p21) 
= ae? — 21) ale ee ee ele ἐν Ζ1). 


We now make use of the identities 
gk — zk = (z — 2,)(z*7! + 25 32, + ΖΡ 522 +--+ + 21 ἢ) 
(k = 1,2,...,”). Factoring out z — z,, we obtain 


pz) = (z -- 3. συ 3 Ὁ Ὁ ΡΠ ΡΠ ἐξ ἘΠῚ. Ἔ ΜῈ ΞΕ 
+ a,(2"-? + 2*-8z, +--+ Ζῇ 5) 
te 630,3 

τ Ay —2(Z or Z1) Ἔ Ay —1)- 


complex numbers and polynomials 35 
Calling the expression in brackets p,(z), we find upon rearranging that 
Pr(Z) = σ02 5 + (doz, + a,)z"~* +--- 
+ (doZt~* + ayzt~? +++++ ay-3) 


Ρι thus is again a polynomial; since ay # 0, the degree of p, is n — 1. 
This completes the proof of theorem 2.5b. 

Let now p be a polynomial of degree ἢ > 1. By the fundamental 
theorem, it has at least one zero, z, say. By theorem 2.5b, p can thus be 
represented in the form (z — z,)p,(z), where p, is of degree ἢ — 1 2 1. 
Again by the fundamental theorem, p, has at least one zero, say Zo, and 
thus is representable in the form (z — z2)p.(z), where pz is a polynomial of 
degree ἢ — 2. The process evidently can be continued until we arrive at 
a polynomial p, of degree zero. Since the leading coefficient of every p,, 
is Go, it follows that p,(z) = a) # 0. We thus find the following repre- 
sentation for the polynomial p: 


(2-31) p(z) = (2 — 2;)(2 — Zg).. (2 — Ζρ)ᾶρ. 


We thus have represented p(z) as a product of polynomials of degree 1, or 
linear polynomials, of the form z — z,. The numbers z, (Κ = 1, 2,..., 7) 
evidently all are zeros of p. Any number different from all the z,,’s cannot 
be a zero, since a product of nonzero complex numbers never vanishes. 
It is not asserted that the numbers z,, are all different. However, we have 
proved the following: A polynomial of degree n = \ has, at most, n distinct 
zeros. 

An important consequence of this statement is as follows: Jf two 
polynomials p and q, both known to have degrees <n, assume the same 
values at n + 1 points, then they are identical. For the proof, consider 
the polynomial p — g. Its degree is at most ἡ, and yet its value 15 zero at 
the ἡ + 1 points where p andg agree. This can only be if p — q vanishes 
identically. 

Above, we have in the relation (2-31) represented a polynomial of degree 
n as a product of linear factors. It is conceivable that this representation 
depends on the order in which the zeros 2,, Zo,..., Z, have been split off. 
However, we now shall show that for a given polynomial p the representa- 
tion (2-31) is unique (apart, obviously, from the order in which the factors 
appear). 

Our assertion is trivial if p has nm distinct zeros, for in this case each 
zero must appear in the representation (2-31), and since each must occur, 
€ach can occur only once. It is quite possible, however, that a poly- 
nomial of degree n > 1 has less than n distinct zeros, and that as a conse- 
quence the numbers Ζ1, Zo,..., Z, are not all different from each other. 


36 elements of numerical analysis 


EXAMPLE 
16. The polynomial p(z) = z* — 42" + ὅσ" — 42 + 1 has the repre- 
sentation 


(z — 1p(z — 1)(z — I) — 1). 


1; in the representation (2-31) we have 


p(z) = (z — 1)* 


There is only one zero, Ζ 
ΖῚ = 2g > 73 = 2% = |. 


I 


In order to prove the uniqueness of the representation (2-31) even in the 
case of repeated zeros, assume p has another representation of the same 
form, say 


(2-32) p(z) = (2 — zi) — 23)... «(2 — Ζω)ϑο: 
From equations (2-31) and (2-32) we obtain the identity 
(z — z,)(Z — Za)..-(Z — 2n)do = {2 — 2)(Z — Za). -(2 - Ζ.})Ρ0. 
The expression on the left is zero for z = z,. Hence the expression on the 


right, too, must vanish when z = z,. This is possible only if one of the 
numbers 2’, say zi, equals z,. Cancelling the factor z — z,, we obtain 


(z — Z2)...(Z — Ζι)αρ = (2 -- Z9)..+(Z — 2n)Do- 


Because we have divided by z — 2, this identity is at the moment proved 
for z τέ z, only; however, since both sides are polynomials (of degree 
ἢ — 1), and since the set of all points 4z, comprises more than ἢ — | 
points, the identity must in fact hold for all z. Now we can continue as 
above: One of the numbers Z5,..., Zn, SAY Z2, must be equal to z, (even 
though z, could be the same as z,). Splitting off the factor z — 2, 
and continuing in the same vein, we find that for a suitable numbering 
of the z;. 
a ἘΞ Sis a — a τη: 


We thus have proved: 


Theorem 2.585 A polynomial p(z) = doz" + a,z"~* +-++-+ a, of 
degree n can be represented in a unique manner (up to the order of 
the factors) in the form 


p(z) = (z — Ζι)ί(ξ — 22)... -- Z,)Ao- 


Each zero of p occurs at least once among the numbers Z), 22, - - -» Zn 


If a zero z of p occurs precisely k times among the numbers 2), Za, - - .. Zn 
the zero is said to have multiplicity k. A zero of multiplicity one is also 
called a simple zero. The zeros alone do not completely determine a 


complex numbers and polynomials 37 


polynomial, not even up to a constant factor, but the zeros together with 
their multiplicities do. 


EXAMPLE 

17. The two third degree polynomials 
p(z) = (2 — 1I)z + 1)? 
4(2) = (z -- 1)" (2 + 1) 

both have the zeros z = Ϊ and z = —1, and no other zeros. In the case 


of p, 2 = 1 has multiplicity 1 and z = —1 has multiplicity 2. In the case 
of g the multiplicities are reversed. 


253  χῇ -- 2 --͵ωὀ 


χ8.-.ἕ χ5.. 2 .} 


Multiplying out the factors in (2-31) and comparing the coefficients 
with those of (2-30) we obtain relations between the numbers z, and the 
coefficients of the polynomial. These relations are known as Vieta’s 
formulas. The simplest of these formulas are those obtained by compar- 
ing the coefficients of z*~* and by comparing the constant terms: 


(2-33) fhe, Ξε ee <2 
0 
(2-34) ee Coe ee 
Ao 


The fundamental theorem of algebra guarantees the existence of at 
least one zero of any polynomial of positive degree, and thus indirectly of 
the representation (2-31) in terms of linear factors. A completely 
different problem is the actual computation of these zeros. As pointed 
out already in §1.3, explicit formulas (involving root operations) can be 
given only if the degree does not exceed four, and are frequently impractical 
already when the degree is three. For polynomials of degree > 4 the zeros 
can be found, in general, only by algorithmic techniques. However, for 
some special polynomials of arbitrary degree the zeros can be found 
explicitly. 


EXAMPLES 


18. Let p(z) = χ' — 1. According to §2.3 the zeros of this polynomial 
are the wth roots of unity. Since a) = 1, the representation in terms of 
linear factors is thus given by 


2 — 1 = (z — 1)(Σ — εἰσ)(Σ — e9).. (Σ — et), 


Where » = 2π|η. 


38 elements of numerical analysis 


19. To determine the representation of the polynomial 
pl) = aE ee ΟΕ 
in terms of linear factors. We find 
5» [3 er a ae BOE ae ee oie 2, 
and upon subtracting p(z) and dividing by z — 1, 
p(z) = 5 τὶ (z Ξ 1). 


The zeros of p thus are the (n + 1)st roots of unity, with the exception of 
z= 1. For instance for = 3, 


Ι 21 242 =(2—i(z + DE Ἐ ἢ. 


The above theory refers to arbitrary polynomials with complex coeffi- 
cients. Real polynomials are contained in this class as a special case. 
We now shall state two theorems that are true only for real poly- 
nomials. 


Theorem 2.5d If z is a zero of the real polynomial p, then Z is a 
zero also. 


Proof. Let 
P(Z) = oz" + ayzZ™~* +++ + Gy 


be the given real polynomial. If 
Ὁ -ῷ 202} + QyZ"7* ἜΠ:: + hy 


then by repeated applications of the relations (2-17) 


0 = 0 = apz" + α,25} +--+ 4, 
= Ω02} + ayz"™~1 +--++ ay 
= GZ" + az" * e+ Ay. 


However, since the a, are real, a, = a, (k = 0,1,...,”); furthermore, 
ZF = (z)*. We thus have 


0 = doz" + ayzZ"~* +--+ dy, 


i.e., the number Z is a zero of p. 

In addition to the statement of theorem 2.5d, it is also true that the 
multiplicities of z and Z are the same. For a proof see problem 26 in 
§2.6. As a consequence, we find that in the factored representation of a 
real polynomial each factor z — z, involving a nonreal z, can be grouped 


complex numbers and polynomials 39 


with a factor z — z,,, where z,.; = z,. Multiplying out such a pair of 
complex conjugate factors we obtain 
(2 = 2)e:— 2x) = 27? — (z, + Zy)Z Ἔ Ζεῖε; 
a quadratic polynomial with the real coefficients —(z,, + z,) = —2 Re z;, 
and z,Z, = |z,|*. We thus have obtained: 
Theorem 2.5e Any real polynomial of degree ἢ = 1 can be repre- 


sented in a unique manner as a product of real linear factors and of 
real quadratic factors with complex conjugate zeros. 


EXAMPLES 
"Ἢ P+ Arztl =(—ie+ E+) 
= (z? + 1)(z + 1). 
a Let 
ple) = 2° - 1 =(¢- (2 - La) (2 nee a 
x (z + D(z τι ais vs) [: ah ταν 


The representation by real factors is 
Me) = 6 —- De — 2 + ke + 2+ Des ἐλ 
Problems 


20. Prove: A real polynomial of odd degree has at least one real zero. 
21. Applying Vieta’s formulas to Σ᾿ — 1, prove that 


De 
᾿ + cos + cos (2 5) ts 0s ((n — =) = 0 
n n n ; 


‘ogaat ‘ 27 air 
sin — . —. "ἃ ἃ Ι ; —= —== = 
τ + sin (2 =) 4 + sin [0 1) Z) =o 


22. Find all zeros of the following polynomials: (a) σ΄ + 6z? + 25, (Ὁ) 
z* — 6z? + 25, (0) Ζῇ + 14z* + 625. (Use problem 14.) 

23. The real polynomial p(z) = z* + a,z° + aoz? + az + ay is known to 
have the zeros 1 + jand —1 — i. Determine the coefficients iy. ssh ae 


2.6 Multiplicity and Derivative 


| In this section we wish to point out the following connection between 
the multiplicity of a zero of a polynomial p and the values of the derivatives 


40 elements of numerical analysis 


of p at the zero: If the multiplicity of a zero z of a polynomial p is k, then 


: p(z) = p’(2) = - .Ξ (2) = 9, 
{πὴ ee) # 0. 

The multiplicity of a zero of a polynomial thus can be determined without 
completely factoring the polynomial simply by evaluating the derivatives 
of the polynomial at the zero. 

For real zeros of real polynomials the relations (2-35) are a straight- 
forward consequence of elementary differentiation rules of calculus. If 
x, is a real zero of multiplicity &, we need only to observe that by theorem 
2.5c we can write 
(2-36) p(x) = ( — x1)"qQ), 
where 4 is a polynomial such that q(x;) #0. Differentiating (2-36) by 
means of the Leibnitz formula for differentiation of a product (see Kaplan 
[1953], p. 19), we get precisely the relations (2-35). 

For complex zeros and complex polynomials, however, there arises 
first of all the question of what is meant by the derivatives ρ΄, p",..-- 
(After all, the variables considered in the definition of the derivative in 
calculus are always real.) It is possible to extend the limit definition of 
the derivative to the complex domain in such a manner that the calculus 
proof of (2-35) retains its validity. However, some of the issues involved 
in the theory of complex differentiation are fairly sophisticated, and their 
discussion is best left to a course in complex analysis. Fortunately, as far 
as polynomials are concerned, it is possible to discuss derivatives in a 
purely algebraic manner, without considering limits. 

If p is a polynomial of degree n with real or complex coefficients, 


(2-37) P(Z) = ἄρ᾿ + αι. ++++ + Any 

we define the derivative of first order of p to be the polynomial of degree 
n—l, 

(2-38) p'(z) = παρξ Ἶ + (n — 1)α, 25 5 +++++ ἀρ -α' 

Derivatives of higher orders are defined recursively, for instance 

p"(z) = (p'Y(2) = nln — Maz"? + (a = 1)0ι = 2)α; 25 5 Ao + 26, -- 2. 
EXAMPLES{ 


22. Using binomial coefficients, the kth derivative of the polynomial 
(2-37) can be expressed as follows: 


pz) = ΕΠ (z)aoz"-™ + ι" is ae dee eet [)α...]. 


+ Readers unfamiliar with binomial coefficients should at this point consult §3.5 for 
the definition and basic properties of these coefficients. 


complex numbers and polynomials 41 
23. If p denotes the polynomial (2-37), then clearly 
p'(z) = nldo. 
24. We wish to calculate the Ath derivative of the special polynomial 
P(z) = (z — a)” 


— sit he n=] a n= 
Ξε Ζῇ - (i az ΣΡ (5 )a%z i (- Ὁ" ("Jar 


By example 22, we have 


p®(z) = ki} ἢν = (” a ae" 
ἢ (' 3 cs a al alae ics ἜΣ ΝΙΝ és jer". 


The products of binomial coefficients occurring here can be recombined 


as follows: 
(" - ἜΝ oie! n! 
κ J\m/) ~~ kin - μὴ — k)! m\(n — m)! 


Lee (n — k)! 
k\(n — k)! πα (ἢ — k — m)! 


hee 
k m ) 
Thus we have 


woe = αι} 75" - τ "Jer 


ἢ Ἐπ λον, χες n—k 
εἰ 2 Jatze-*-# 4 —1y-#( nr a 
az (—1) ia 


Η 
By the binomial theorem, the expression in brackets reduces to (z — a)"~*. 
We thus find 
(ke) nt με 
ΕΞ κα — arr. 
This result is familiar, of course, when z and a are real. 


It is clear that the derivative defined by equation (2-38) satisfies some of 


the laws that we expect from real calculus. Thus, if ¢ is an arbitrary 
constant, we have 


(ΟΡ) = cp’; 


42 elements of numerical analysis 
furthermore, if p and g are any two polynomials, then 
(p+q) =p ἘΦ. 
It is somewhat less obvious that the familiar rule for the differentiation of 
a product also holds in the complex case: 
(2-39) (pq)’ = pg + pd. 
In order to prove (2-39), let p be given by (2-37) and q by 
g(z) = bez” + Biz"? he + Bp. 


We agree to set a, = Ὁ fork > nand b, = Ofork > m. The product of 
the two polynomials then can be written 


p(z)q(z) = coz™*™ + cy2"t™—* tee + Coons 
where 
Cy = Agb, + Mb, 4 es + Ay, 10, + abo, 


k =0,1,2,...,m +n. The coefficient of zm+n-1-k in the derivative of 
pg thenis(m+n—k)c,. If, on the other hand, we form the expression 
p'q + pq’ directly, we find 
p'(2)q(z) + plz)q") 
= (παρ ἷ + (ἢ — 1)ayz"-? + +++ + Gn—1)(O02" + byz™-2 +++++ δ) 
(αρ2" + α125 τ} + +++ + ay)(mboz™~* + (mM— 1)byz™- 7 + +++ + Dm-1). 


Collecting the coefficients of all terms combining to gmtn-ink we ind 


ndod;, + (n — 1)αχδκ-α ἜΠ.: Ὁ (πὶ — Kaho 
=| ay(m = k)by, Ἔ a,(m —k+ 1)δ...1 tere a,,mD 
= (n - πὶ — k)(doby + αιρκ.α ἘΠ. Ἔ a,o). 


Since this agrees with (m +  — k)c,, our assertion is proved. 
As in real calculus, we now obtain by induction from (2-39) the Leibnitz 
formula for the kth derivative of a product of two polynomials: 


(2-40) (pqy” sini (5) oa ae (ἢ oP ewe ot [ἢ pa. 


We now are ready to prove: 


Theorem 2.6 Let p be a polynomial of positive degree, and let 
z = z, bea zero of multiplicity m ofp. Then 


p(zx) = p'(z1) = = p™ Mz) = 9, ρμ ζῇ # 0. 


complex numbers and polynomials 43 
Proof. By theorem 2.5c, the polynomial p can be written in the form 
(2-41) p(z) = (2 — 2;)"q(2), 
where g is a polynomial such that g(z,) 4 0. We now form the kth 
derivative of (2-41) by the Leibnitz formula (2-40). By example 24 we find 


p*(z) = (6)ε{{}6 om. 23)" (2) 


+ [ὦ - "ΝΜ, i Je =e eg 15) 


Here we set Ζ = z, and distinguish two cases. If k < m, each term con- 
tains a positive power of z — z, and thus vanishes for z = z,. Ifk =m, 
all terms except the first contain positive powers of z — z,. The first 
term reduces to a nonzero constant times g(z), which is 4 0 for z = z 

Thus theorem 2.6 is proved. ᾿ 


Problems 
24 The polynomial 
p(z) = z° — z* — 8z9 + 2023 — 172 Ῥ 5 


has the zero z = 1. Determine its multiplicity. 
25. What are the multiplicities of the zeros at z = +i of the polynomial 


plz) = 25 + z* + 227 + 227 + Σ- 1? 


26. Prove: Complex conjugate zeros of real polynomials have equal multi- 
plicities. (Hint: The derivatives of a real polynomial are real poly- 
nomials. Now use the theorems 2.54. and 2.6.) 

27. Using the definition (2-38) of the derivative, prove Taylor’s theorem for 
arbitrary polynomials. That is, show that if p is a polynomial of degree ἢ 
then for arbitrary z and A 


a δε: ἬΕΙ ‘toad μι" 
p(z + Ah) = p(z) + πΡρί2) + xP (z) + seek p™(z). 


(Use the representation of example 22 for the kth derivative of p.) 


Recommended Reading 


A more thorough treatment of complex numbers and polynomials 
will be found in Birkhoff and MacLane [1953]. 


chapter 3 difference equations 


Many algorithms of numerical analysis consist in determining solutions of 
difference equations. In addition, difference equations play an important 
part in other branches of pure and applied mathematics such as com- 
binatorial analysis, the theory of probability, and mathematical economics. 

Many aspects of the theory of difference equations are similar to certain 
aspects of the theory of differential equations. We thus begin this chapter 
with a brief review of the idea of a differential equation. 


31 Differential Equations 


Suppose f = f(x, y, z) is a real valued function defined for all x in an 
interval J = [a, 6], and for all y and z lying in certain sets of real numbers 
S,and S;. Thesets S,and 5, may depend on x. The following problem 
is called a differential equation of the first order: To find a function 
y = p(x), differentiable for x € J, such that, for all x in J, 


(i) yxyeS, yi(xye Ss; 
(fi) I(x, V(X), γα = 9. 


The essential condition here is (ii); condition (1) is merely imposed in order 
that condition (i?) makes sense. The problem thus defined is symbolically 
denoted by 


(3-1) I(x, ys ¥) = 9; 

any function y = γ(χ) satisfying the conditions (7) and (ii) is called a 
solution of the differential equation (3-1). 

EXAMPLES 

1. Let J, So, S; all be the sets of all real numbers, and let 


F(X, y; 2) =z-—ky 


difference equations 45 
where k 15 a constant. Every function 
y(x) = Ce™, 


where C is an arbitrary constant, is a solution of the resulting differential 
equation 

y =p, 
2. Let / be the set of all reals, and let Sy = S, = [-1, 1]. If 


f(x, y, 2) = 27 — (1 — y*) 
there results the differential equation 
γ᾽ --- 1 imum γ΄, 
solutions of which are given by the functions 
y(x) = sin (x — a), 

where ὦ is again arbitrary. 

More generally, we consider differential equations of order N, where Ν 
1s ἃ positive integer. Let f = f(x, ¥o, ¥1,..., Yy) be a real valued function 
defined for x Ε 7 and for all yo, ¥1,..., Κα In certain sets of real numbers 


So, 51,;---, Sy. The problem of finding a function y = y(x) defined for 
ΧΕ], having N derivatives on J, and satisfying the conditions 
(i) ; ΟΞ Ss. 6 = O15. 
(ii) F(X, VOX), VO), ...,. V(X) = 0 
for all x in J, is called a differential equation of order N, and is symbolically 
denoted by 

I(x Y's 7 F eg) = 0. 
EXAMPLE 
3. Let N = 2, and let J, 80. and Sz be the set of all real numbers. If 


F(x, Fos Vis Ya) pas ¥o = Vas 
every function of the form 
y(x) = Acosx + Bsinx 


where A and 8B are constants, is a solution of the resulting differential 
equation γ΄ + y = 0. 


The problems studied in the theory of differential equations not only 
concern the analytic representation of some or all of their solutions, but 
Ὁ the general behavior of these solutions, especially when x tends to 
inhinity. 


| 


46 elements of numerical analysis 
3.2 Difference Equations 


The fundamental problem in the theory of difference equations 1s in 
many ways similar to that of the theory of differential equations, with the 
exception that the mathematical object sought here is a sequence rather 
than a function. From the abstract point of view, a sequence is merely a 
function defined on a set of integers. More concretely, a sequence s is 
defined by associating with every member of a set 1 of integers a certain 
(real) number. Traditionally, the number associated with the integer n 
is denoted by a symbol such as s, and not by the usual functional notation 
(m1). 

In almost all applications, the domain of definition of a sequence is a 
set of consecutive integers, i.e., the set of all integers contained between 
two fixed limits, one or both of which may be infinite. Frequently / is the 
set of all nonnegative integers. 

Sequences can be denoted by a single letter such as s, much as functions 
are denoted by single letters such as ἢ. More commonly, however, the 
sequence with the general element s,, is written either as 80. 81. S2,...$ OF 
simply as {Sy}. 

A sequence can be defined by giving an explicit formula for s,, such as 


: n> 21. 


i Tee = 


More often than not, however, the elements of a sequence are defined only 
implicitly. Difference equations are among the most important tools for 
defining sequences. 

We first explain what is meant by a difference equation of order 1. Let 
f = (f(y, D} be a sequence of functions defined on a set / of consecutive 
integers and for all y and z belonging to some set of realnumbers S. The 
problem is to find a sequence {x,} of real numbers defined on a set con- 
taining J so that the following conditions are satisfied for all ne /: 


(i) xia S, 
(ii) Fnl%ns Kg) = Ὁ. 


A sequence {x,} satisfying these conditions is called a solution of the 
difference equation symbolized by equation (i/). 

EXAMPLES 

4. Let J be the set of all integers. The choice f,(y,z) = Κ -- 2 — Ι 
yields the difference equation 


Xn-1€ ΩΣ 


Obvious solutions are given by x, = Η + c for every constant c. 


difference equations 47 


5. Let J be the set of nonnegative integers, f,(y,z) = yp—z—n. There 
results the difference equation 


Xn — Xn-1 = FM, 
having (among others) the solution 
_ πίη + 1) 

a 4 7 
6. Let I be the set of all integers, and let f,(y, z) = y — gz where ᾧ is a 
nonzero constant. We obtain the difference equation 

Χ = GXn-1- 
A solution satisfying x) = 1 is given by x, = 45. 


The difference equation of order N, where N is a positive integer, is 
defined in a similar manner. Let f = {f,(y0, )1,-.-, Pw} bea sequence 
defined on a set / of consecutive integers, whose elements are Panctions 
defined for y, (k = 0, 1,..., N) ina set of real numbers S. The problem 
is to find a sequence {x,}, defined on a set containing / and satisfying the 
following conditions for alla Ὲ 1]: 


(i) AGS. Mos Seis eee: 
(ii) Tek Nn-ls-ses Xn-n) = 0. 
A sequence {x,} with these properties is again called a solution of the 
difference equation symbolized by (ii). 
EXAMPLES 
7. Let I be the set of all integers, f,(Vo, ¥1, Yo) = Yo — 2COSPY1 + Ya, 
where » is real. A solution of the resulting difference equation 
X, — 2008 @ X,-4 + X,-5 = ( 


is given by x, = cos (n¢). 
8. Let J be the set of nonnegative integers, (Vo, ¥1, ¥2) = Yo — ¥1 — γ}α. 
Can you find a solution of the difference equation 


χη ππι Xpeg “-α Δ, = 0? 

As in the case of differential equations, the problems studied in the 
theory of difference equations not only concern the analytic representation 
of some or all solutions, but also the general behavior of these solutions, 
especially when ἡ tends to infinity. : 

A differential equation usually has many solutions. In order to pin 
down a solution, we have to specify some additional property of it, such 


48 elements of numerical analysis 


as its value, and perhaps the values of some of its derivatives, at a given 
point. The problem of finding a solution of the differential equation 
under these side conditions is called an initial value problem. Similarly, 
in order to pin down the solution of a difference equation, we have to 
prescribe the values of some elements of the solution {x,}. For instance, 
in example 8 above, we may require that Xo = x; = Ι. 

The solution of an initial value problem for a difference equation Is, in a 
way, a trivial matter. We assume that the equation 


fil Yor Vas ++ ++ Yn) = 0 


can be solved for Yo, 


yo = EnV ayers yw) 


The difference equation can then be written as a recurrence relation for 
the elements of the sequence {x,}, 


(3-2) Xn = PANG αι αν εκ +9 Nein) 


Once N consecutive elements of the solution are known, further elements 
can be obtained by successively evaluating the functions g,. In‘ this 
manner we find for the difference equation of example 8 the solution 


MP2 3,5 8 13503 


There remains the problem, however, of finding an explicit formula for x,, 
and of determining αἰ! solutions of a given difference equation. 


3.3 Linear Difference Equations of Order One 


A difference equation is called /inear, if, for each ne I, the function f, 
is a linear function of yo, ¥1;--.;’y- The coefficients in that linear func- 
tion may, however, depend on ἢ. That is, for certain sequences {do, αἰ» 
{ay nj,» +s {Ann}, and {δι} we have 
(3-3) Fil Yoo Yas ++ +9 ¥n) = GonVo + AnV1 +-+++ ἀνα} + On. 
EXAMPLES 
9, The difference equations considered in the examples 4 through 8 are 
linear. 

10. The difference equation 
X, + SAXn-1 + N?Xq-2 = 2 
is linear. 
11. The difference equation 
X, — 2x2-, =0 
is not linear. 


difference equations 49 


A linear difference equation is called homogeneous, if f,(0, 0,...,0) = 0 
for all ne J, 1.e., if the sequence {5,} has zero elements only. The linear 
difference equations considered in the examples 6, 7, and 8 are homo- 
geneous. The equation of example 10 is not homogeneous. Likewise 
the equation of example 11 is not homogeneous, because it is not linear. | 

Postponing the discussion of linear difference equations of order 
N > 1 until chapter 6, we shall in the present section consider linear 
difference equations of order 1 only. Assuming that a,, τέ Ὁ. πα] we 
may divide through by a,,, and write any such difference equation a the 
form (3-2), viz., 


(3-4) Ἀν = Xe he ἔνι, 


We shall assume that J is the set of positive integers and shall try to find a 
closed expression for the solution of (3-4) satisfying an arbitrary initial 
condition X) = (. 

We first consider the homogeneous equation 
(3-5) Nn = GyXy-1- 


It is seen immediately that the solution satisfying x, = c is given by 


(3-6) Xn = Cin, 
where 
(3-7) ΠΤ = Ἰ. TW, = @,do...4,, i = 1, essai 


To find the solution of the nonhomogeneous equation (3-4) satisfying 
Xo = c, we use the method of variation of constants familiar from the 
theory of ordinary differential equations. Thus we set 
(3-8) Xn = Can, 
where the sequence {c,} is to be determined. In view of x9 = Como = Co 
the initial condition yields co = c. Substituting (3-8) into (3-4) we find 

eC. 4 We ee = δ. τῷ Ἔ Dy 
We assume that a, # 0, ἡ = 1, 2,..., which implies that 7, 40. The 


last relation then yields 


b 
ae n 
Ty 


and it follows that 


Cy = Cy + (ὦ — Co) + (C2 -- €1) +e +++ (Cy — Cn-1) 


b 
πέρι ey. 


Wy 


2 Ty 


50 elements of numerical analysis 
For later reference we summarize the above as follows: 


Theorem 3.3 Let td. ΞΞ 0, A= Ι. 2, ed nat The solution of the 
difference equation : 
Ke ΞΞ ἄ,Χ,... Ἢ b,, 


satisfying Xx» = ¢ is given by 


ee ee 
(3-9) xg τι πε σι 2+ 24 a i ay Berean 
where πὸ = 1, tn = 4102---4n ES) ea eee 


We observe that this solution can be considered as being composed of 
the sequence {cz,}, which is the solution of the homogeneous equation 
having initial value c, and of the sequence 


which is the solution of the nonhomogeneous equation having initial value 
zero. The principle of superposition familiar from the theory of 
differential equations thus prevails also for difference equations. 

Whether or not the solution (3-9) can be expressed in simple form 
depends on the possibility of expressing the products πῃ and the sums ¢, ἴῃ 
simple form, much as in the case of differential equations the simplicity 
of the solution depends on the possibility to evaluate certain integrals. 


Problems 
1. Let —1 < a < 1. Solve the difference equation 
Xa ΞΞ ἃ, = OX, 7 1 


and determine the limit of the sequence {x,} as -- ῦ 0. | 
2. Let z be an arbitrary real number, z # 0. Solve the difference equation 


iH 
Xo = 1, xn ΞΞ need 1, 


Show that 


3. Let the infinite series 
Σ bs 
n=0 


be convergent, and let its sum be s. Show that 


s= lim x, 


To oO 


difference equations 51 
where {x,} denotes the solution of the difference equation 
Χῃ = δῃ. a, PO fat at 
4. It is shown in calculus that 


ta) 3 . 
Ζ 51 Wz 
ΓῚ [1 -- 3) = Ι 
il We 


n=1 


Show that the infinite product can be obtained as lim,.. x,, where {x,} 
satisfies the linear and homogeneous difference equation 


“3 
Xo τῷ ', κε πὶ (1-3) κι Η Ξο ἵ, 2,....... 


3.4 Horner’s Scheme 


Several applications of theorem 3.3 will be made in later chapters. At 
this point we wish to show how difference equations can be used to cal- 
culate the values of a polynomial. 

Let by, b,,..-., by be N + 1 given constants, and let z be a given real 
number. We define 


Algorithm 3.4 Calculate the numbers Xo, x;,..., Xy recursively by 
the relations 
(3-10) in = bs; Kg 2X4 a4 Ἢ Bs n> Ι, ἡ eee N. 


Evidently, the finite sequence {x,,} defined in algorithm 3.4 is the solution 
satisfying x, = b, of a difference equation (3-4), where a, = Ζ, ἢ = 
1,2,...,.N. In this special case, 7, = z" and hence, by virtue of theorem 
3,3, 


Χμ = βυῖῖ + by? +++ + δὰ 
and in particular for n = N, 

Xy = boZ™ + ByzX~* +--+ Dy. 
We thus have obtained: 


Theorem 3.4 The numbers x, produced by algorithm 3.4 equal the 
values of the polynomials p, defined by 


PrlX) = Dox” + byx 7) +---+ δα ἘΞ ey 
at x = Ζ. 
Algorithm 3.4 may thus be regarded as a method for evaluating a 
polynomial. This method is generally known as Horner’s rule. It 


requires ἢ multiplications and n additions for the evaluation of a poly- 
nomial of degree n, whereas the “‘straightforward’’ method—building up 


52 elements of numerical analysis 


the powers of x recursively by x" = χ χης 1 and subsequently multiplying 
by the b,—normally requires 2n — | multiplications and n additions. 
Schematically, Horner’s method may be indicated thus: 
by ἢ bass By 
+ + 4 ὐ 
Xp > Χχ > χα 5 τὸ ἂχ 
(The symbol = indicates multiplication by Zz.) 
Scheme 3.4 
EXAMPLE 
12. To evaluate the polynomial 
p(x) = Ἴχ' + 5x8 -- 2χ' + 8 
at x = 0.5. Scheme 3.4 yields (note that the coefficient bs = Ὁ may not 
be suppressed!) 
TT >! . = OES ὦ 
DP peo 225 1.125 $3623: 


It follows that p(0.5) = 8.5625. 


3.5 Binomial Coefficients 


We wish to obtain a generalization of algorithm 3.4 and recall to this 
end some facts about binomial coefficients. 
The binomial coefficients 


[ (to be read: “Ἢ over m’’) 


are for integraln andm,n 20,0 Ξ πὶ Ξ ἢ, defined by the expansion 


ΠΣ 


It is well known that 


| n\ πίη -- ἰγία -- 2).. (ἡ -- κι Ἔ Ly. n! 
oy) [ἢ 7 62:3 25 ~ min — m)! 


From (3-11) it follows immediately that 


(3-12) (1, mn) = ὦ 


The identity 


(3-13) (7) 7 5 Ἧ ) r Gian i) 


difference equations 53 


likewise is easily proved. It states that if the binomial coefficients are 
arranged in a triangular array known as Pascal’s triangle, 


then each entry in this array is equal to the sum of the two entries im- 
mediately above it. We also shall require the following slightly less 
obvious property of the binomial coefficients. 


Theorem 3.5 For any two nonnegative integers k and ἢ = k the 
following identity holds: 


7) ἢ Ἔ ) k +2 A Ι 

:- εἰ ) ot (7) = ar : 
ι k Bop ea wt 
Proof. Keeping k fixed, we use induction with respect‘to n. Set 


n= EE oe Gh 


Assuming that for some integernm — 1 = k 


3-14 vane! oe 
(3-14) ee hae 


which is certainly true forn — 1 = k, we have, using (3-13), 


re Η ἢ i n-+ 1 
Xn = Xy-1 + = = | 
Xn-a 4 (, i Ι i (7) ᾿ + i): 


proving (3-14) with nm increased by one, which suffices to establish the 
formula for all positive integers ἢ. 

EXAMPLE 

13. Fork = 2, = 5 we get 


(2) + (2) + (Ὁ) + () το 31 6 10=20= (Ὁ) 


Problems 


5. Use the definition of the binomial coefficients to prove that 


ὦ) + (i) + (2) +--+) = 


54 elements of numerical analysis 


6. Obtain an alternate proof of theorem 3.5 by differentiating the identity 
(x — 10 + xt x?tee ta) ax 


k + 1 times and setting x = 1. | 
7. Show by means of theorem 3.3 that the difference initial value problem 


A+na 
yo = 1, ee | n ) 


has the solution 


meal) CE) Coe C29) 


Show that the same initial value problem is also solved by 
ἡ βαρ ἘΦ ἢ 
i . n 


and hence obtain yet another proof of theorem 3.5. (Use (3-12).) 


3.6 Evaluation of the Derivatives of a Polynomial 


One is frequently required not only to find the value of a polynomial, 
but also the values of one or several derivatives at a given point. For 
instance, if the polynomial 


p(x) = Box™ + διχῆ +--+ + by 
is to be expanded in powers of a new variable h = x — z, then we have 
by Taylor’s theorem (see Taylor [1959], p. 471) 
p(x) = p(z + ἢ) = δον + cyh~* +--+ + ὧν, 
where 


(3-15) Cuan =P» k= 0,1,..-5N. 


These coefficients could be evaluated, of course, by differentiating p 
analytically and evaluating the resulting polynomials by Horner’s 
algorithm 3.4. It turns out, however, that there is a far shorter way. 
To this end we consider the following extension of algorithm 3.4. 


Algorithm 3.6 Let the sequence generated in algorithm 3.4 now be 
denoted by {x}. For k =1,2,...,.N, generate the sequences 
{x} recursively by the difference equation 


(3-16) 4 ee EHD, a eels bia, 
n=0,1,...,.N—k. 


Each of the sequences {x} is generated from the preceding sequence 


difference equations 55 


{xi*-»} just as the sequence {x{?} was generated from the sequence 
{b,}. However, the new sequence always terminates one step before the 
old sequence. Scheme 3.6 indicates the resulting two-dimensional array 
for N = 4. 


Si x — x xi 


(1 <1) 


(2 7 
xis? x? 

4 
xp 


(The symbol = indicates multiplication by z.) 
Scheme 3.6 


The relevance of this scheme to the problem stated above is based on the 
following fact: 


Theorem 3.6 If p,, denotes the polynomial 
PrAlX) = Dox” + διχῆ 1 +--+ δ. SF ony ἢ 
then the numbers x{ determined in algorithm 3.6 satisfy 


G-17 ΜΡ = BRA) 


A= 12 RES OL. 


Thus, the coefficients cy_, required in (3-15) are just the entries in the 
lower diagonal (underlined in scheme 3.6) in the array of numbers 
generated by algorithm 3.6. 


Proof. Theorem 3.4 asserts the truth of formula (3-17) fork =0. We 
use induction with respect to k to prove it for arbitrary k. Assuming 
that (3-17) is true for a certain k = 0, we can write the difference equation 
(3-16) defining the sequence {x{**} as follows: 


| by 
x@tD = Pee), KEAD a χχ τ + ΠΣ 


Expressing the solution of this equation by means of formula (3-9), where 


a. = 2, TT, = ae b,, = τ pet i(Z), 
we obtain 
le as -1 : 
XY = ΚΙ {pie(z) + 2~-*phh s(z) + Σ᾿ ΞΡ} .(2) +--- z~ "ph? <(Z)}- 


56 elements of numerical analysis 


Here we express the polynomial derivatives by means of the formula of 
example 22, chapter 2: 


Pe) = (7, )b02"=* 6 ἡ Ν ἐν ἘΠ::Ὲ (Jb 


This yields the expression 


kt D ες (ἢ) δ + mi : *)boz i (;,)2:| ys Lane 


k oe k 
fe eae : )ρυν' + 4 oe Juz? hee (;,)é| } 


By rearranging the sum by collecting terms involving like powers of z this 
can be transformed into 


eon ἡ Ὁ Ὁ 7" Gee 
[ἢ Ἐν (rrr + (el 


By using the identity of theorem 3.5, this simplifies into 


eon (MEE Nie (Ὁ part ει. oe 
meen am (TAT pboet + (fa ype? tet (ea a) 


EXAMPLE 
14. To compute the Taylor expansion of the polynomial 


gn) = Tx® +5 — 2x" + 8 
at x = 0.5. Continuing the scheme started in example 12, we obtain 


ae ae 0 8 
wad AAI τες TNE RNS 1 ies ἡ UO τς 
yee as es” 8 Stee 
q “Meaty Sees hae 
ΠΣ ° 160 

7 ~~ 19.0 

7 


We thus have 
Txt + 5x8 — 2x? + 8 = 7(χ — 0.5)* + 19(x — 0.5)° 
+ 16(x — 0.5)? 
+ 5.25(x — 0.5) + 8.5625. 


difference equations 57 
Problems 
8. Find the Taylor expansion of the polynomial 


p(x) = 4x8 — 5x τ 


atx = —0.3. 
9. Calculate p(x), p’(x), p"(x) for x = 1.5, where 


p(x) = 2x® — 7x* + 3χ3 — 6x? + 4x -- 5. 


it | is PART ONE 


| | SOLUTION OF EQUATIONS 


| 


᾿ 


chapter 4 iteration 


In this chapter we shall begin the study of the problem of solving (non- 


᾿ linear) equations in one and several variables. The method of iteration is 


- 


J 


the prototype of all numerical methods for attacking this problem. It is 
‘based on the solution of a certain nonlinear difference equation of the 
first order. The principle of iteration is of great importance also in 
certain branches of theoretical mathematics such as functional analysis 
and the theory of differential equations. 


4.1 Definition and Hypotheses 


_ Weare here concerned with the problem of solving equations of the form 


Algorithm 4.1 Choose x, arbitrarily, and generate the sequence 
{x,} recursively from the relation 


(4-2) αὶ, ἘΞ Κ7χ,.ὖ. w= 1,2,.... 


__ At the outset, we cannot even be sure that this algorithm is well defined. 
{it could be that f is undefined at some point f(x,).) However, let us 
assume that fis defined on some closed finite interval J = [a, 6], and that 


_ the values of f lie in the same interval. Geometrically this means that the 


‘graph of the function y = f(x) is contained in the square a S$ x Ξ ἢ, 
@Sy Ξ b (see Fig. 4.18). 
_ Under this assumption, if x) ¢J, we can say that all elements of the 
‘Sequence {x,} are in 1. For if some x, €/ with n = 0, then also x,4; = 
0) € J, since f has its yalues in J. 
A glance at figure 4.18 shows that the above hypotheses are not sufficient 


61 


62 elements of numerical analysis 


Figure 4.1 


to guarantee that equation (4-1) has a solution. The graph of the function 
y = f(x) need not intersect the graph of the function y = x. However, if 
we assume the function f to be continuous, then the graph shows that the 
equation has at least one solution. For the graph of y = f(x) originates 
somewhere on the vertical straight line segment joining the points (@, a) 
and (a, 6), and it ends somewhere on the straight line segment joining 
(b, a) and (δ, δ). Since the graph is now continuous, it must intersect 
the straight line y = x(a S x S δ), perhaps at an endpoint. If sis the 
abscissa of the point of intersection, then 
y=s and y=/f(s) 

at that point, hence s = f(s), i.e., the number s 15 a solution of (4-1) (see 
Fig. 4.1b). 

The above intuitive consideration can be couched in purely analytical 
terms, as follows. Consider the function g defined by a(x) = x — f(x), 
a<x <b. This function is continuous on the interval [a, Ὁ]; moreover, 
since f(a) = a, f(b) = b, it satisfies g(a) 5 9, e(b) = Ὁ. By the inter- 
mediate value theorem of calculus (see Taylor [1959], p. 240) it assumes 
all values between g(a) and g(b) somewhere in the interval [α, 8]. There- 
fore it must assume the value zero, say at x ΞΖ 5. This implies 0 = s 
— f(s), or s = f(s). Thus the number s is the desired solution. 

If an element of the sequence {x,} defined by algorithm 4.1 is equal to s, 


iteration 63 


then all later elements will also be equaltos. For this reason, any solution 
s of x = f(x) is frequently called a fixed point of the iteration defined by 
the function αὶ . | 

The assumptions made so far do not preclude the possibility of the 
existence of several, or even infinitely many, fixed points in the interval 
fa, δ] (see Fig. 4.10). 

If we wish to be sure that there is not more than one solution, we must 
make some assumption guaranteeing that the function f does not vary too 
rapidly. We could assume, for instance, that fis differentiable, and that 
its derivative Κ΄ satisfies 


(4-3) 


PRLS. wes =f. 


where ZL is some constant so that 2 < 1. It turns out, however, that the 
following weaker assumption is sufficient to guarantee uniqueness: There 
exists a constant L < 1 so that for any two points x, and X2 in I the following 
inequality holds: 


(4-4) fon) — ΓΌΩΝ $ Lx: -- xa. 


Any condition of the form (4-4) (whether 1, < 1 or not) is called a 
Lipschitz condition, and the constant L 15 called the Lipschitz constant. Let 
us first show that condition (4-3) implies condition (4-4) with the same 


64 elements of numerical analysis 


value of L. By the mean value theorem of differential calculus (see 
Taylor [1959], p. 240), 
f (x1) — f(%2) =f (x*)(x1 — Χο), 
where x* is a suitable point between x, and x2. Relation (4-4) now 
follows readily by taking absolute values and using (4-3). om a 
We now show that the Lipschitz condition (4-4) with L < 1 implies that 
equation (4-1) has at most one solution. Assume that there are two 


Figure 4.1 


solutions 5: and Sse, for instance. This means that the following relations 
both are true: 


Sy, = f (Si); Sy = f(S2)- 


Subtracting the second relation from the first, we obtain 
51 — Sg = f(S1) — f(S2). 


Taking absolute values and applying (4-4) to the difference on the right, 
we get | | 

[51 — Sl = If(s1) — f(s2)| Ξ 115: - Sql. 
If s; # So, we can divide by |s, — 8.) and obtain 1 < L, contradicting the 
assumption that L < 1. Thus 851 = Se, i.e., any two solutions of (4-1) are 
identical. 


iteration 65 
Problems 


1. Show that the following functions satisfy Lipschitz conditions with L < 1: 


(a) f(x) = 5 — 1 οο8 3x, Of x & 27/3; 
(b) f(x) = 2 + 4IxI, “taxes 1; 
(c) Ξε χα", 2 Bow 55, 


2. Let m be any real number, and let [εἰ < 1. Show that the equation 
x=m—esinx 


has a unique solution in the interval [m — 7,m + 7]. 

3. Prove that any function satisfying a Lipschitz condition on an interval J 
is continuous at every point of J. 

4. Construct an example showing that not every continuous function satisfies 
a Lipschitz condition. 


4.2 Convergence of the Iteration Method 


Having disposed of the theoretical preliminaries required to establish 
existence and uniqueness of the solution of (4-1), we now turn to the 
practical question of determining this solution by means of algorithm 4.1. 
Rather surprisingly it turns out that the assumptions which were made in 
$4.1 to guarantee existence and uniqueness suffice also to establish the 
fact that the sequence {x,} generated by algorithm 4.1 converges to the 
solution 5. 


Theorem 4.2 Let / = [a, Ὁ] be a closed finite interval, and let the 
function / satisfy the following conditions: 


(i) fis continuous on J; 
(ii) f(x) Ἑ 1 for all χε]; 
(iii) f satisfies the Lipschitz condition (4-4) with a Lipschitz constant 
[asa . 


Then for any choice of x, € J the sequence defined by algorithm 4.1 
converges to the unique solution s of the equation x = f(x). 


Proof. That the hypotheses of theorem 4.2 guarantee the existence of a 
unique solution s of x = f(x) has already been shown in §4.1. To prove 
convergence of the sequence {x,}, we shall estimate the difference x, — s. 
By definition, 

Xn — 5 =f(%n-1) — 5 = f%n-1) — SO) 


and hence, by the Lipschitz condition, 


ix, — 5| SL\x,_-1 — 5]. 


66 elements of numerical analysis 


| i} | iteration 67 
ἡ I 
| | | Applying the same inequality repeatedly, we find fe~*, 1], and thus a fortiori in [0, 1]. For x,, xg in this interval, we have 
i (4-5) IX, — δ᾽ = L*|xo — 5]. | by the mean value theorem 
' | Since 0 = L< Ι, lim [1 = 0, ΠΕΣ αὖ f(%2)| = ΟἿ x4 a χα], 
Ι i ἘΠΒ Ἢ feltened hed es where f’(x*) = —e-*". Since the maximum of | f’(x)| for x € [0, 1] is 1, 
| | I lim lx, — δ᾽ = 0, (4-4) holds only with L = 1, violating the condition that 1, < 1. How- 
hl, | i ste gales she pame as | ever, let us consider the smaller interval J = [4, log 2]. Since 
| | Hh | lim X, = 8, Ῥ ει e7 lon? < οἵ < ee < log 2 
| , | ica 
| a 


Hi completing the proof. ἡ 
} The convergence of the sequence {x,} is illustrated very suggestively by 
| | plotting the points (Xo; Xo)s (Xo. x1); (Xi, X1)s (x4, Xa), (Xo, Xa)s (χα. Xa)s sa 


for + = x S log 2, J is again mapped into 1 by the function /, and (4-4) 
now holds with L = |f'(4)| = e7"? = 0.606531. Beginning with x) = 


| | srilleligge ae is 7 0.5, the first few values of the sequence {x,} are as follows: 
in a graph of the function y = f(x) (see Fig. 4.2). It is evident from the 


figure that convergence cannot take place in general if condition (4-4) 


| | | does not hold with some L < 1. Table 4.2 
| i | Binet | n Xi fn) Xn 

"ΠΕ 1. It is desired to find a solution of the equation a : ἷ ; 
| | | on eek 0 0.500000 0.606531 0.567624 
Π] Wh | redone AP sanction {(x) τοῖο: lie in the interval 1 0.606531 0.545239 0.567299 
|| Hi For 0 < x 1, the values of the function f(x) = e°™ He in” 2 0.545239 0.579703 0.567193 
ΙΝ 1] 3. 0.579703 0.56005 0.567159 
a | y 4 0.560065 Ο0.571172 0.567148 
1} β 5 0.571172 ἠυ.564863 0.567145 
| | ] 6 0.56486585. 0.5684338. 0,.567144 
| 7 0.568438 0.566410 0.567144 
| | I 8 0.566410 0.567560 
7] 9 0.567560 0.566907 
Hil | bs 10 0.566907 0.567278 
] | 5 1 0.567278 0.567067 
i ὶ 12 0.567067 0.567187 
᾿ ii | 13 0.567187 _—0.567119 

| | 
| 
| 


NT | Ἐπ | 14 0.567119 
7 ων 
ἢ Π[Π|Π[ | ~ | Problems 
| 7 . - Ὡς β 5. Kepler’s equation 
| : y =f (x) m= x — Esinx, 
dynamical astronomy. Solve the equation iteratively for m = 0.8, 


me Pe | τὰ . tuk =. E = 0.2, by writing it in the form 
| VA “A er ar a ἄπο *0 Ξ 


| || | where m and Εἰ are given and x is sought, plays a considerable role in 
| | 
] 


| x=m-+ Esinx 
| Figure 4.2 and starting with x» = m. 


68 elements of numerical analysis 


6. The solution of Kepler’s equation can also be represented analytically in 
the form 


x=m+ > = J,(nE) sin nm, 
n=l 


where J, denotes the Bessel function of order ἡ. Using tables of Bessel 
functions, check the result obtained in problem 5 and compare the amount 
of labor required. 

7. Find the only positive solution of the equation 


x®—-x?-x-1=0 
by iteration, writing it in the form 


1 


RSE SS 
x 


x? 


(Begin with x5 = 1.) 

8. Assume that the function αὶ in addition to the hypotheses of theorem 4.2, 
is differentiable and satisfies f(x) < 0, χει. If xo < 5, prove both 
analytically and by considering the graph of f that 


χὰ Φ χα ΦΞ eee Rome χε τ χα Φ 11. 


9, Suppose the function g is defined and differentiable on the interval [0, 1), 
and suppose that ¢(0) < 0 < g(1),0 < a S g(x) S b, where a and 6 are 
constants. Show that there exists a constant Μ such that the solution of 
the equation g(x) = 0 can be found by applying iteration to the function 


f(x) = x + Ma(x). 
10. What is the value of 
s=V24V94 Vie--2 


(Hint: The number s may be considered as the limit of the sequence 
fx,} generated by algorithm 4.1, where f(x) = V2 + x, x0 = 0. Show 
that f satisfies, in a suitable interval, the hypotheses of theorem 4.2.) 


4.3 The Error after a Finite Number of Steps 


No computing process can be carried on indefinitely, and in any 
practical application algorithm 4.1 must be artificially terminated after 
having computed, say, the element x,. We are interested in finding a 
bound for the quantity |x, — s|, that is, for the error of x, considered as 
an approximation to the solution s. This bound should depend only on 
quantities that are known a priori and should not depend on a knowledge 


iteration 69 


of the solution itself. (This is why a result such as (4-5) does not satisfy 
our purpose.) 

To establish such a bound, we require the following auxiliary result: 
ΠΡ = 0, 1, 2,3. ss 


(4-6) IXnoa eo Xn| = L" |x, πῇ Χα]. 


Evidently this is true for m = 0. Assuming the truth of (4-6) for n = 
k — 1, where Καὶ is an integer >0, we have 


[Xe+1 — Xe] = χ — f%x-2)I 
Ξ L|\xX_ — Χροαὶ 
SEL" x, — Xo| 
= L*\x, — Xl, 


establishing (4-6) form = k. The truth of (4-6) for all positive integers n 
now follows by the principle of mathematical induction. 

Let ἡ now be a fixed positive integer, and let m > ἢ. We shall finda 
bound for x,, — x, Writing 


Se On = ig — Spat) (Kg ag Xmen) ga Xe) 
and applying the triangle inequality (see Taylor [1959], p. 443) we get 
τον} Ξ [ie ρα A ete ieee ri i τ ὅλ 
and using (4-6) to estimate each term on the right, 
[Xm — Xn] S (L712 40+ + Τό xy — xo. 
Now, since |L| < 1, 


Diet θυ Pe DY ee eee oes) 
SPA ek + 2 ees) 


Vee 
gal ἢ 


by virtue of the familiar formula for the sum of the geometric series. We 
thus have 


th 


Xm — χα S Fz | — Xl. 


In this relation let m-—> οὐ while keeping n fixed. By virtue of x, -Ὦ δ 
we obtain 


Corollary 4.3 Under the conditions of theorem 4.2 the error of the 
nth approximation x, defined by algorithm 4.1 is bounded as follows: 


1 
(4-7) [x — οἱ Ξ τΞ | — ol. 


70 elements of numerical analysis 


EXAMPLE 
2. The error of the final element x,, produced in example | is less than 
ἘΞ ὀπὶ | 


Actually, since f’ < 0, it follows from problem 8 that the solution satisfies 


0.567119 < s < 0.567187. 


Problem 


11. For Kepler’s equation considered in problem 5, estimate the error of X10. 


4.4 Aecelerating Convergence 


Let us now assume that the function f, in addition to satisfying the 
hypotheses of theorem 4.2, is continuously differentiable throughout the 
interval J, and that the derivative f’ is never zero. This means that / is 
either monotonically strictly increasing or monotonically strictly decreas- 
ing throughout the whole interval J. If xo # 5, it is evident froma graph 
that under this condition no x, can be equal to the exact solution 5, 1.8., 
the iteration process cannot terminate in a finite number of steps. An 
analytical proof of this fact is as follows: Assume the contrary, namely, that 
f(x,) = x, for some n. If n is the first index for which this happens, then 


Xp Tie τ = I(Xn)s Xn-1 = Xns 
and hence, by the mean value theorem (see Taylor [1959], p. 240), 
0 = iad) ὧδ ΠΡ ὡ- FONG oe Kas 
where x* is between x,_, and x, Since x,-; — χὰ τῇ 0, it follows that 


f'(x*) = 0, contradicting the hypotheses that f’ never vanishes. 
The above implies that the error 


is never zero. We now ask: Does the limit 


lim dn +1 


TL oo d,, 


exist, and if it does, what is its value? 
Using the mean value theorem once more, we have 


dn+1 = Xn+i - § 
= f(Xn) = 
= f(s + d,) — f(s) 
= f(s ἂν θ. εἶ.) ΓΑ 


iteration 71 


where 0 < 6, < 1. Let us define ε, by 
f(s + Ph, d,) = [ᾧ) + ep. 


We then have 

(4-8) An +1 ee (f"(s) τ En) dy, 

and, since ες + 0 for n — oo by virtue of the continuity of f’, 
(4-9) lim 283 = fi), 


This equation shows that the error at the (n + 1)st step is approximately 
equal to f‘(s) times the error at the nth step. As s is unknown, the 
limiting ratio f’(s) of two consecutive errors is, of course, unknown, and 
all we really know is that the ratio of two consecutive errors approaches 
some unknown limit. We now shall show, however, how to obtain a 
significant improvement of the convergence of the iteration method by a 
judicious use of this incomplete information. 

Let us heuristically proceed under the assumption that (4-8) is exact 
with <, = Ὁ for finite values of n. We then have, writing f’(s) = A for 
brevity, 

Xn+1 — 5S = A(x, — δ) 
Xn+2— 5. = A(Xny1 — δ). 


It is an easy matter now to eliminate the unknown quantity A and solve 
for 5. Subtracting the first equation from the second, we obtain 


Xn+a — Xn41 = A(Xne1 — Xn)s | 
hence 


: .- Xn+2 — Xn+1 
An+1 — Xp 


Solving the first equation for s and substituting for A, we get 


poplin A 
= 1 —G ari -- Xn) 


] 
Xn + τ τ ἡ Ons —- = 


(nasa oF Xn)? 


— An i - * 
Ane 2 ve i ae i Xn 


a a assumption that e, = Ὁ were correct, we thus could obtain the exact 
Olution from any three consecutive iterates x,, X41, πᾳ. In reality, of 


72 elements of numerical analysis 


course, the ες are not exactly zero: They are, however, ultimately small 
compared to f(s). We thus may hope that for n large the quantities 

; , (Xp = ἜΝ 

4-10 Χ τε χα -- Oooo 

(5:18) Xn+2 — 2Xn+1 + Xn 

yield a better approximation to s than the quantities x,. We are thus led 
to investigate the properties of the following algorithm: 


Algorithm 4.4 (Aitken’s A?-method) Given a sequence of numbers 
{x,}, generate from it a new sequence χη by means of formula (4-10). 


Formula (4-10) can be simplified somewhat by means of the difference 
operator 4. If {x,} is any sequence, we write 


ee ἧτὰ PAN, ἢν aie ois 
Higher powers of the operator A are defined recursively. For instance, 


We, = MARY = Oty se = A 
= ας — ἄχρι + Xn: 


It is now evident that formula (4-10) can be written thus: 


: | pte (Axn)? 
(4-11) fn Χὰ “Ὁ 2x, 
This notation, together with the fact that algorithm 4.4 was discussed by 
Aitken [1926], explains its traditional name. 


EXAMPLE 

3. The last column of table 4.2 (see example 1) contains the accelerated 
Aitken values of the sequence {x,}. It is seen that after six steps the 
sequence {χη} has converged to the number of digits given, whereas the 
original sequence {x,} has not even nearly converged after fourteen steps. 


Problem 


12. The convergence behavior typified by (4-8) is sometimes characterized by 
the statement that “‘the number of correct decimal digits in x, grows at a 
fixed rate.” Give a quantitative interpretation of this statement, allowing 
you to answer the following question: As n— «©, how many steps are 
necessary (on the average) to reduce the error by a factor 3/5? 


4.5 Aitken’s A?-Method 


After having discovered algorithm 4.4 in a heuristic manner, we now 
shall present a rigorous analysis of it. This analysis applies to arbitrary 


iteration 73 


sequences having certain convergence properties and is not confined to 
sequences arising through iteration. The basic result is as follows. 


Theorem 4.5 Let {x,} be any sequence converging to the limit s 
such that the quantities d, = x, — s satisfy d, τέ 0, 


(4-12) An +4 re (A 7 en) fies 


where 4 is a constant, |A| < 1, and ε, >0 forn—+o. Then the 
sequence {χη} derived from {x,} by means of algorithm 4.4 is defined 
for n sufficiently large and converges to s faster than the sequence 
{x,} in the sense that 


(4-13) 


Proof. Applying (4-12) twice, we have 
dn +2 a (A 7 En+1)(A + En) dn. 


Hence 
APxX, = Kye — 2χραι + Xp 
= ἐκ. — 2ά, ει + dy 
= [(A — 1)? + «J 4, 

where 


£, = Α(ε, Ῥ En+1) — 28, + Enna 
By virtue of «, — 0 it follows that also 
(4-14) ει, > 0, n—> οὐ. 


Weconclude that (4 — 1)? + ¢, τῇ Ofor allsufficiently largen,n > no, say. 
It follows that 4*x, τὰ Ὁ form > πο; hence the sequence {x,,} is defined for 
n> Mo. We have 


Ax, = 4d, = (A + e, — 1) d, 


and hence, subtracting s from (4-11), 


ae aS (Ax,)* 
xX, —-s=d, 2x. 
~g -Aclt a) % 
ee hee SUEY πὸ 
Oe = te Ep en ἢ 
aS Se 


The hypothesis that ε, -- 0 and (4-14) now insure that 


* Xn = ἐπ — 2e,(A = 1) — eh 
aa (A — 1)? + δὰ 


— 0, 
as desired. 


74 elements of numerical analysis 


The result of theorem 4.5 is immediately applicable to sequences 
generated by iteration, as shown by corollary 4.5. 


Corollary 4.5 Let the function f, in addition to the hypotheses of 
theorem 4.2, have a continuous derivative on 1, and assume that 
f' #0. Provided that xo #5, the sequence {x,} generated by 
algorithm 4.1 satisfies the hypotheses of theorem 4.5, and algorithm 
4.4 thus results in a speedup of convergence. 
Proof. As was shown at the beginning of §4.4, the hypotheses of the 
corollary suffice not only to show that d, τέ 0, n = 0, 1, 2,..., but also 
that (4-8) holds, which is precisely what is needed in theorem 4.5. 
Although originally motivated by the iterative procedure, the effective- 
ness of the 4?-acceleration is in no way confined to sequences generated 
by iteration. Some other instances where it may be applied are given in 
problems 15, 16, and in chapter 7. 


Problems 
13. Apply algorithm 4.4 to the sequences obtained in problems 5 and 7. 
14. Let x, be the mth partial sum of the infinite series > κου ax, 


nH 


An = > ὠς 


k=0 
Show that algorithm 4.4 in this case yields the formula 


xX, =X, + — Gn +t 
i " Qn+1 π Gn+2 
15. Show that the hypotheses of theorem 4.5 are satisfied if the terms a, of 
the series considered in problem 14 satisfy 
n = (@ + €,)z", 
where a is a constant, |z| < 1, and where the sequence {e,} tends to zero. 
(Hint: Introduce the quantities 
ὃ, = sup |ex|.) 
ken 


16. Evaluate the sum 


sa 1 
2 cosh(# log 2) 


to 5 decimal places. (Apply problem 14.) 
17*. Assume the quantities «, in theorem 4.5 are such that the limit 


(4-15) lim “22 = B 


nao ἔῃ 


exists. Show that the convergence of the sequence χη} can be sped up 
even further by applying algorithm 4.4 once more. 


iteration 75 


18*. Show that condition (4-15) is satisfied if the sequence {x,} is generated 
by algorithm 4.1, provided that the second derivative f” of / exists, is 
continuous, and satisfies f"(s) # 0. 


4.6 Quadratic Convergence 


In §4.4 we had assumed that f’(x) # 0 on the interval J, and thus in 
particular that f‘(s) #0. We then obtained a convergence behavior 
characterized by relation (4-8). This is known as /Jinear convergence. 
(The number of correct decimal places is approximately a linear function 
of the number of iterations performed.) Let us now investigate the 
asymptotic behavior of the error if f’(s) = 0. We first note that if this is 
known to be the case, then it is not necessary to verify all the hypotheses 
of theorem 4.2. 


Theorem 4.6 Let J be an interval (finite or infinite), and let the 
function f be defined on J and satisfy the following conditions: 


(i) f and f’ are continuous on J; 
(ii) the equation x = f(x) has a solution s, located in the interior of ], 
such that {΄(6) = 0. 


Then there exists a number d > 0 such that algorithm 4.1 converges 
to s for any choice of Xp satisfying |x) — s| Ξ ὦ 


The conclusion of the theorem can be expressed by saying that the 
algorithm a/ways converges when the starting point is “sufficiently close” 
to the solution. 


Proof. Let J; denote the interval [5 — d,s +d]. Since s is in the 
interior of J, J, is contained in J if d is sufficiently small, d S$ εν say. Let 
L be given, 0 - L < 1. By the continuity of f’, there exists d satisfying 
O<dsd, such that [f'() —/f(s| Ξ ΚΕ ἹἘ SL for χε. An 
application of the mean value theorem now shows that for x € J,, 


If) — s| = |f@) —f@)| Ξ Φἰχ -- δὶ S Ld « d; 


thus, the values taken by fin J, lie in J;. The hypotheses of theorem 4.2 
are thus satisfied for the interval J,, and as a consequence algorithm 4.1 
converges. 

Let us now assume, in addition to the hypotheses of theorem 4.6, that 
J” exists, is continuous, and does not vanish on J,. As at the beginning 
of §4.4, we then can show that if x, # 5, no x, will be accidentally equal 
to s, and that the iteration algorithm cannot yield the exact solution in a 
finite number of steps. 


76 elements of numerical analysis 


Using Taylor’s theorem with remainder (see Taylor [1959], p. 476) we 
find the following expression for the error d,.1 = X,+1 — 8: 


dys) ales oe τσ ὃ 
fxn) -- 19) 

= [Ὁ dn + 4£'(S + On dn) ἀκ. 
Here θ., denotes, as usual, an unspecified number between zero and one. 
By virtue of our assumption that f’(s) = 0 the above expression simplifies, 
and we get 


(4-16) dua =4f + On 4) a2. 
Since d, τὰ 0 and d, ~ 0 for n - oo it follows that 
(4-17) lim - = $f"(s). 


Relation (4-16) states the remarkable fact that if f"(s) = Ὁ, the error at 
the (n + 1)s¢ step is proportional to the square of the error at the nth 
step. This type of convergence behavior is known as quadratic conver- 
gence. It is frequently, if somewhat vaguely, described by stating that 
the number of correct decimal places is doubled at each step. 

EXAMPLE 

4, Let a> 0, and let f(x) = ξ(χ + a/x) for x > 0. The equation 
x = f(x) has the solution x = να. Itis easily seen that f’(Wa) = 0, and 
that f"(x) > 0, x > 0. It thus follows that the sequence defined by 


] a 
sei = 3 (s+ 5) 
converges to \/a quadratically, provided that x, is sufficiently close to Va. 
(Actually, it can be shown that the sequence converges for every choice of 
Xo > 0, see §4.9.) This algorithm for calculating a is a special case of 
Newton’s method, which is discussed in the next section. 


Problems 


19. For what values of the constant M in problem 9 does the sequence 
defined by x, = f(x,-1) converge quadratically to the solution s? 

20. Let the function f have a nonvanishing third derivative, and assume that 
s = f(s), f(s) = f(s) = 0. Show that in this case the limit of d, «4/4? 
exists for X9 sufficiently close to s. (The convergence is called cubic in 
this case.) 

21*. Let 


cx? +a 


I(x) = 


iteration 7 


Determine the constants 5, c, d in such a manner that the sequence 
defined by algorithm 4.1 converges cubically to Vz. Use the algorithm 
thus obtained to calculate V 10 to ten decimal places, starting with x) = 3. 


4.7 Newton’s Method 


The reader might well be under the impression that the discussion of 
quadratic convergence in the last section is without much practical value, 
since for an equation of the form x = f(x) the condition Κ (6) = 0 will be 
satishied only by accident. However, it will now be shown that, at least 
for differentiable functions Κα the basic iteration procedure of algorithm 
4.1 can always be reformulated in such a way that it becomes quadratically 
convergent. 

Let the function F be defined and twice continuously differentiable on 
the interval J = [a, b], let F’(x) τὰ Ὁ for x ε J, and let the equation 


(4-18) Fixy = 0 


have the solution (necessarily the on/y solution) x = 5, where s lies in the 
interior of J. We have already observed that this solution can be found 
by applying iteration to the function 


F(x) = x + MF(x), 


where Μ is a constant that must satisfy certain inequalities (see problem 9), 
Unless we are lucky, the convergence of the iteration sequence thus 
generated is linear. We now ask: Can we determine a function ἢ (depend- 
ing on f, but easily calculable) so that the iteration sequence generated 
by the function 

f(x) = x + A(x) FO) 


converges to s quadratically ? 

In view of theorem 4.6 the sole condition to be satisfied by (in addition 
to the obviously satisfied condition f(s) = s) is f’(s) = 0. In view of 

F'(e) = 1 + μ"(ΧἹΕ(ΧῚ + AC) Ε΄ (ΑἹ 
this yields 
h'(s)F(s) + A(s)\F'(s) = --ἰ 

or, since F(s) = 0, 

Ϊ 
A simple way to satisfy this condition (we do not claim it is the only way) 
15 to choose 


h(s) = -- 


h(x) = aes 


78 elements of numerical analysis 


We are thus led to the following algorithm: 


Algorithm 4.7 Choose xo, and determine the sequence {x,} from the 
recurrence relation 


(4-19) tan Ξ χα τι BY fe 0, Ἐπ ες ae 


Algorithm 4.7 is known as Newton's method or as the Newton-Raphson 
method. The convergence of Newton’s method for starting values Xp 
sufficiently close to s follows immediately from theorem 4.6, since the 
iteration function f(x) = x — F(x)/F’(x) was specifically constructed so 
as to satisfy f/(s) = 0. If we assume, in addition to the hypotheses made 
above, that F” exists and is continuous, then it easily follows that f” 15 
continuous, and hence that the convergence of the sequence defined by 
(4-19), if it takes place at all, is quadratic. 

Formula (4-19) has a very simple graphical interpretation. We approxi- 
mate the graph of the function F by its tangent at the point x,, that is, 
F(x) is replaced by | 


F(Xn) + (% -- ΘΕ Ω. 


Setting this expression equal to zero and solving for x, we find equation 
(4-19) (see Fig. 4.7). Intuitively appealing as they may be, considerations 
such as these tell us nothing about the nature of convergence, nor are they 
easily extended to the case of systems of equations. 


iteration 79 
4.8 A Non-local Convergence Theorem for Newton’s Method 


The results proved in §4.7 are still unsatisfactory inasmuch convergence 
is proved only for starting values “sufficiently close” to the solution s. 
Since the exact solution is unknown, the question how to find the first 
approximation X9 remains unanswered. We now shall prove a result 
which explicitly specifies an interval in which the first approximation may 
be taken. The one new hypothesis which must be made is that F” does 
not change sign in the interval under consideration. 


Theorem 4.8 Let the function F be defined and twice continuously 
differentiable on the closed finite interval [a, Ὁ], and let the following 
conditions be satisfied: | 


(i) Fla)F(b) < 0: 
(ii) F'(x) # 0, χ εἴα, 6]: 
(iii) F"(x) is either 20 or SO for all χ ε [a, δ]; 
(iv) ε ce denotes that endpoint of [a, b] at which |F’(x)| is smaller, 
hen 


Then Newton’s method converges to the (only) solution s of F(x) = 0 
for any choice of xq in [a, ῥ]. 


Some explanation of the hypotheses is in order. Condition (i) merely 
States that /(@) and ἐδ) have different signs, and hence that the equation 
F(x) ἘΞ Ὁ has at least one solution in (a, δ). By virtue of condition (ii) 
there is only one solution. Condition (iii) states that the graph of Fis either 
concave from above or concave from below. Condition (iv) states that 
the tangent to the curve y = F(x) at that endpoint where | F'(x)| is smallest 
Intersects the x-axis within the interval [a, ὁ]. 


Proof. Theorem 4.8 covers the following four different situations: 


(a) F(a) < 0, F(b) > 0, F"(x) ΞῸ (c = δ); 
(Ὁ) F(a) > 0, F(b) < 0, F"(x) = 0 ( -- ᾿ 
(c) F(a) < 0, F(b) > 0, F(x) = 0 (ε -- δ); 
(d) F(a) > 0, F(b) < 0, F(x) < 0 (c =a). 


oe (b) and (d) are readily reduced to the cases (a) and (c), respec- 
y, by considering the function —F in place of F. (This change does 
a change the sequence {X,}.) Case (0) is reduced to case (a) by replacing 
᾿ y 4 τὰ Ἧ nee Seta the sequence {x,} into {—x,}, and the solution s 

Ps. thus suffices to prove the theorem in case (a). Here the h 
of F looks as given in figure 4.8. ᾿ si 


Ι 
| 
{ Γ 
| | | 80 elements of numerical analysis iteration 81 


Consider now the case where s < x, = ἢ. Using once more the mean 
value theorem, we have F(x.) = Ε΄ (χα) λυ — 5), where s < x® < Xx), and 
hence, since F’ is decreasing, F(x) 2 (Xo — 5) Ε΄ (Χο). It follows that 


= a F(X) < σὸς ΕΥ m5 
yo ay F'(xo) = ἀρ (Xo 5) = δ. 


On the other hand F(x.) = F(b) — (b — Χ ΕΓ Χο) where x Ξ x} Ξ ὃ, 
hence | 


F(%o) Ξ F(b) — (6 — xo)F'(8). 
By virtue of condition (iv) of the theorem we thus have 


LC fae 5 δ, 


| as ig a τὰς ἢ “7 : 
| η aera F'(q) We are thus led to the algorithm considered in example 4. Since 
| "ΠΝ 1 : : Ρ' ῃ " ᾿ ἶ ᾿ a ; r 
and hence that Fg) = 0, implyinethatg =, (x) > 0, F"(x) > O for x > 0, we are in case (d) of theorem 4.8. In any 


Ὶ ᾿ | Figure 4.8 ἀξ Meets Jia Ft). a F'(b) = xo — F'(b) + (b — Xo) 
HN Wa = to πτ απ + “ ) 5Ξ 

] | Let s be the unique solution of F(x) =0. We first assume that Xo — (6 — a) +O — %) = a. 

"ἢ α Ξ Χο Ξ 5. By virtue of F(x.) Ξ 0, it is clear that Hence a © x, Ξ s, and it follows by what has been proved above for the 
mill | = Xo y ) 

Ha Ϊ | | case d = Χο Ξ s that the sequence {x,} converges to s. Thus the proof of 

| : F(X) , ; 
| | ij x1 = Xo — FG) = Rigs theorem 4.8 is complete. 

Hil) Ι || Ἵ “Ὁ The literature on Newton’s method is extensive. Convergence can be 
ii | We assert that x, Ξ 5, Xn41 2 X, forall values of m. This being true for proved under various sets of conditions other than those of theorem 4.8, 
i Ι n = 0, it suffices to perform the induction step from n ton+1. If and it is also possible to give bounds for the error after a finite number of 

il) ] [ | x, Ξ 5, then by the mean value theorem steps (see Ostrowski [1960]). The theorem given above, however, covers 

| | iM ~ F(x,) = F(s) — F(x») = (s — X_)F'(x*) some of the most important special cases. 

" where x, Ξ χῇ Ξ 5. By virtue οἵ Ε΄) Ξ 0, F ' is decreasing, hence NPE Scenic Special Cases of Newton’s Method 
) ΡΝ Ξ ΒΌὼ ΜΕῚ 
| | Ml —Flx,) εἰ ἢ — ΧΕ (Xn) , e now shall give some examples for the application of theorem 4.8. 
| iM 4 (1) Determination of square roots. Let c be a given number, c > 0, 
| ᾿ Ι i i an F(x) and let 
" tas = Fee ggg tet aM mil Fa)=xt-c (x > 0). 

ἽΠ}}}} ; i | ,-- 

β | Consequently, F(x,41) Ξ Oand X,+0 = Xne1 — ἔχη 2), (Xn+1) 2 Χπ ει» | We wish to solve the equation F(x) = 0, i.e., to compute x = Ve. 
Ml) AM completing the induction step. Ψ | Newton’s method takes the form 
i Ι ] ᾿ | Since every bounded monotonic sequence has a limit (see Taylor [1959], F(x ) 
| 11} p. 453) it follows that lim,... x, exists and is Ss. Denoting this limit by Xn+1 = Χῃ στ’ Fx.) 
hy | i 4 and letting ἢ -- oo in the relation : ἢ 
Wet F(x ) a 
"ἢ nl : = i. = y 
aa x = Xt. = | it 
| | | vis ED or = 

Ι] il ser Υ͂ : 5 | f } | 
M | ῃ it follows by the continuity of F and F that (4-20) Xnai = ; (x 4 <). 
ie F@) ih 
|| | Wall 
ἢ 


82 elements of numerical analysis 


interval [a, δ] withO < a < Vc < b the smallest value of the slope occurs 
at x = a, and it is easily seen that condition (iv) 15 satisfied for every 
b = i(a + c/a). Thus it follows that the sequence defined by (4-20) 
converges to Vc for every choice of xp > 0. | 

Formula (4-20) states that the new approximation is always the arith- 
metic mean of the old approximation and of the result of dividing ¢ by 
the old approximation. For work ona desk computer it is not necessary 
to work with full accuracy from the beginning since every new value can be 
regarded as new starting value of the iteration. 


EXAMPLE 
5. c= 10, x = 3 yields 


X= 3 οὐχ = 3.3 
x, = 3.15 e/x, = 3.1746 
χὰ = 3.1622 6|Χα = 3.16225532 


χα = 3.16227766 c/Xg = 3.16227766 
Since x, = x, to the number of digits given, the last value is accepted as 
final. 7 : 

(ii) Finding roots of arbitrary order. If F(x) = x* — ¢, where c > 0 
and k is any positive integer, Newton’s method yields the formula 


xt —e 
Xn+1 = Xn — κχι-ϊ 
or 
Ιλ ee 
(4-21) Xn+1 >= [ +’ 1)» v Ee 


Again the conditions of theorem 4.8 are satisfied for every interval 
[a,b] if 0 < a < Wc and ὁ is sufficiently large, and the sequence defined 
by (4-21) converges to Vc for arbitrary Xo > 0. | 

(iii) Finding reciprocals without division. Fora given c > 0 we wish to 
determine the number 


1 
s=— 


This may be regarded as the solution of the equation 


iteration 83 


Newton’s method yields 


| 
-- -τ δ 
χὰ εχ σοι ἢ 
n+l n l 
"Ὁ 
= ee (T= Χο χῃ 
or 
(4-22) Xue. = Xq(2 — €X,). 


No divisions are necessary to compute the sequence {x,}. Since 


μ Ϊ tf 2 
ΓΞ < 9, F"(x) = 3 > 0 
for x > 0, we are in case (b) of theorem 4.8, and convergence of the 
algorithm is assured if we can find an interval [a, b] so that a < εἴ < b 
and 


Fb) _ ae 
The last inequality is satisfied if 
yes i + vi | — ac 


‘Since a > Ὁ may be made arbitrarily small, this means that the sequence 


defined by (4-22) converges to c~* for any choice of x, such that 
δ τ Ξ 26 
EXAMPLE 


6. To calculate e~', where e = 2.7182183. Starting with x, = 0.3, we 
find 


Xo = 0.3 2 — ἔχῃ = 1.1846 

x, = 0.355 2 — ex, = 1.0350106 

χα = 0.367429 2 — ex, = 1.0012244 

χα = 0.36787889 2 — eXg = 1.00000150 


χα = 0.36787994 2 — ex, = 1.00000000. 


The quadratic nature of the convergence is quite evident in this example 
(doubling of the number of zeros in the second column at each step). 


Problems 


22. Show that the function F(x) = x — e> “satisfies the conditions of theorem 
4.8 in the interval [0,1]. Hence solve the equation by Newton’s method, 
Starting with x, = 0.5, 


84 elements of numerical analysis 


23. Calculate “7 to nine decimal places. 
24. Determine a numerical value of 4 without division, beginning a Newton 
iteration with x») = 0.3. 
925. Show that the numbers defined by (4-20) satisfy 
— Ve xX, —- Vc\*" 
Si. ve - (% νὴ ae ἐς ἢ ἅς Ἄς, 
Hence verify that the sequence {x,} converges to Ve quadratically for 
arbitrary choices of xo > 0. 
26. If (4-22) is started with x» = 1, show that 
ye aurea ie 
: c 
Deduce that the sequence thus generated converges for 0 < c < 2, and 


that the convergence is again quadratic. 
27*, How do we have to choose the function ἢ in the formula 


ει τ By 
ὈΞῪ "οί eel 


such that the iteration defined by f converges cubically to a solution of 
F(x) = 0? 


4.10 Newton’s Method Applied to Polynomials 


Newton’s method is especially well suited to the problem of deter- 
mining the zeros of a polynomial in view of the simple algorithm that is 
available for calculating both the value of a polynomial and that of its 
first derivative. Let 


P(X) = AX" + ayxN~* + +++ 4 ay 


be a given polynomial of degree N. If z is any (real or complex) number, 
and if the constants bo, b;,..., 5, and ¢o, Ci,..+)¢w-1 are determined 
from the recurrence relations} of algorithm 3.6, 


(4-23) bo = a, i) ee cat (p= σον ΝῚ 
(4-24) cy = bp, Cy = 2Cn-1 + δ, = νων Ν᾽ -- 1) 
then we have by a special case of theorem 3.6 

by = p(z), νι =P) 


The quantities p(z) and ρ΄ (2) required in each step of Newton’s method can 


+ We now write ¢, in place of x, to avoid confusion with the variable x. 


iteration 85 


thus be calculated very easily. The computation proceeds as indicated in 
scheme 4.10. 


do ay le ay —4 ay 


iid Gir at Ue! eee 
bo = δὶ = bg τ... => by, τὸ by 
a Ὸ ὦ Εν 


Co > Cy = Co vee o> Cy 
Scheme 4.10 


—> indicates addition, = multiplication by z and addition. 
EXAMPLE | 
7. To determine a zero of the polynomial 
p(x) = χ' — x7? 4+ 2χ - 5 


near x = —1. (In table 4.10, the coefficients a,, b,, c, are arranged in 
columns rather than in rows.) 


Table 4.10 
ay, T by Cx pip’ | 
i 1 | 
ee | —] -- --3 
2 4 7 0.142857 
5 | 
1 1 1 
Χι = —1.142857 - --2.142857 — 3.285714 — 0.010305 
2 4.448979 8.204076 
5 — 0.084546 
1 1 ] 
χα = —1.129807 —] —2.129807 — 3.259614 
2 4.406241 8.089006 0.002691 
5 0.021764 
1 1 previous 
χα = —1.132498 - i — 2.132498 value of p’ 
2 4.415050 retained! — 0.000004 
5 — 0.000035 
1 1 
he =-1.132404 τ sia 
2 4.415037 
5 — 0.000002 


es Se ea ey es en eee! ae σι 


Hi 
| 
i 


] 
| ᾿ 
| 


| 3 | | 
ΠΝ] 
i 
ΠῚ 
| 
! 1 
| 1 
Ι} ᾿ 
healt 
| ee | 
| 
wii 
| Aili 
͵ 
| 


86 elements of numerical analysis 


If zis a zero of the polynomial p, then, according to theorem 2.5b, p can 
be represented in the form 


(4-25) p(x) = (x = 240), 


— where 


q(x) — px * bax tied + bye 


is a certain polynomial of degree N — Ἰ. Multiplying the two poly- 
nomials on the right of equation (4-25) and comparing coefficients of like 
powers of x, we find 

ag = ba, 

ἔπ .Ἱ1 = —zb, + by-1; i= Po ney as 


Solving for 5,4, and replacing n + 1 by n, we find that the second relation 
is identical with (4-23). Thus we have: 


Theorem 4.10 If z is a zero of the polynomial p, the coefficients 
bo, by,..+, by-1 defined by equation (4-23) are identical with the 
coefficients of the polynomial (x — 2) ‘p(). 


EXAMPLE 
8. Accepting x, = —1.132494 as a zero of the polynomial p considered 
in example 7, we have 


q(x) = a = x2 — 2.132494x + 4.415037. 

Thus, having found a zero of a polynomial p of degree N, the poly- 
nomial g whose zeros are identical with those of p save the one which has 
already been determined is easily constructed. The remaining zeros of p 
can now be found more easily as the zeros of g. The process of passing 
from p to g is known as deflating p. By successive deflations, the zeros of 
a given polynomial thus can be found by working with polynomials of 
successively lower degrees. 

Unfortunately, the above process is subject to accumulation of round- 
off errors in view of the fact that the data of the problem (in this case, the 
coefficients of the given polynomial) enters the computation only at the 
first step. It is thus advisable to recheck each zero found by using it once 
more as a starting value for a Newton’s process applied to the full 
(undeflated) polynomial p. 

We mention without proof that local convergence of Newton’s method 
can also be established for complex zeros and for polynomials with complex 
coefficients. However, for the determination of pairs of conjugate 
complex zeros of polynomials with real coefficients a more efficient method 
is available (see §5.6). 


iteration 837 


Problems 
28. Determine the unique positive zero of the polynomial 
p(x) = x® — χ — x -— 1 


| by Newton’s method. 
29. Find the zero near 0.9 of the polynomial 


p(x) = 4x® — 5x° + 4x4 — 3x5 + 7x? — 7x + 1. 


30. Prove the relation Cy-1 = Ρ΄ (2) by differentiating the recurrence relation 
(4-24) and observing that the 4, are functions of z. 


4.11 Some Modifications of Newton’s Method 


Newton's method requires the evaluation of the derivative F’ of the 
given function F. While in most textbook problems this requirement is 
trivial, this may not be the case in more involved situations. for instance if 
the function F itself is the result of a complicated conpntetion It 
therefore seems worthwhile to discuss some methods for solving Foe 0 
that do not require the evaluation of F’ and nevertheless retain some of the 
favorable convergence properties of Newton’s method. | 

(i) Whittaker’s method (Whittaker and Robinson [1928]). The simplest 
way to avoid the computation of Ε΄ is to replace F’(x,) in (4-19) by a 
constant value, say m. The resulting formula , ‘ 


(4-26) Sd a ἀρ ee 
il 


then defines, for a certain range of values of m, a linearly converging 

procedure, unless we happen to pick m = F’(s). If the estimate of m is 

_ pate may nevertheless be quite rapid. Especially in the 
al stages of Newton's process it is usually not necess , , 

cessa compl 

ston y ry to recompute F 


(ii) Regula falsi. Here the value of the derivative F’(x,) i 
» ἘΠ -of the derivative F Ἢ ate. 
by the difference quotient ae eer 


Ὦ τ Xy-1 
formed with th imati 
Ε΄ ,.η. : the two preceding approximations. There results the 
(4-27) χερὶ = x, — On τ Fad Fn), 


F(x,) ; P(t, -4) 


iteration 89 


| | 88 elements of numerical analysis 
| 


| it | This is identical with | il 21. If we set x = 5 + A, the iteration function for Newton’s 
ἽΝ] | ss, 
] τ "ΝΜ χα τίσ = χα, Ρ F 
᾿ | att Ee) = ew fa) = x - £9 
. 1 | : ‘ ae ἐξ 5 15 “ΗΠ ᾿ far le. 
He but for numerical purposes the “‘incremental form (4-27) is preferab 4 | | 
| | The algorithm suggested by (4-27) is known as the regula falsi. It 1s can be expanded in powers of ἢ to give 
i | | defined by a difference equation of order 2 and thus is not covered by the Ge Heyes (m!)-2F(s)h™ 4 OCA" τὴ 
1 ᾿ general theory given at the beginning of this chapter. A more detailed 5 eee ee [om — DI FMO@hR™ + Oh) 
| i th | investigation (see Ostrowski [1960], p. 17) shows that the degree of con Ι 
] ' I) vergence of the regula falsi lies somewhere between that of Newton's ee τ h + 0(h?). 
|| et Ἢ : . | 
1a "ἢ method and of ordinary iteration: me . τ: ᾿ | 
| | | | . (iii) Muller’s method (Muller [1956]). The regula falsi can be obtained From this we find 
A ] by approximating the graph of the function F by the straight line passing oie = Fe Ι 
i | through the points (*,-1, Fix, ἢ) one a FO. | he point of inter- tai lim ee ς- -_ τ᾿ 
i | i is li ith the x-axis defines the new approximation X,+1 | | 7 
| ] | | aren at en Thus, if m # 1, f(s) #0, and the convergence fails to be quadratic, 
Ϊ | | Instead of approximating F by a linear function, it seems natural to try lk gaily ae shows how to oe the iteration function 
| i i | to obtain more rapid convergence by approximating F by a polynomial p im order to achieve quadratic convergence. we set 
ΠΤ, | | ] | of degree k > 1 coinciding with F at the points X,, Χὴ -αν +++» %n-m and mF(x) 
] | | WIR to determine χης as one of the zeros of p. Muller has made a detailed | fQ) = χ -- FG)’ xés 
᾿ NM | | study of the case k = 2 and found that this choice of Καὶ yields very satis- | : ἘΠῚ 
| a | factory results. Since the construction of p depends on the theory of the | tah : 
| i | interpolating polynomial, we postpone the derivation of Muller’s algorithm | then a computation similar to the one performed above shows that 
| | | | Ϊ to chapter 10. 4 | f(s) = 0. By theorem 4.6, the sequence defined by 
i | ] (iv) Newton’s method in the case F (5) ἘΞ 0. Newton's algorithm was (4.28 rp mF(X,) feos) 
| 1 il | derived in §4.7 under the assumption that F(x) ‘ 0, τών ον particular ) ον SMM gal at ala 
By Wh) oe Eo | consider t 4] situation where ἢ he : 
| | | | that F's) # 0... Let ns now ae coins cs converges to s quadratically, provided that x, 15 sufficiently close to 5. 
| Wu F(s) = F'(s) =--- = ἔα τ 9) = 9, F ™(s) # Ὁ, For m = | this algorithm reduces to the ordinary Newton process. 
] | Admittedly the above discussion is somewhat academic, because only 


rarely in practice we have an a priori knowledge of the fact that F’(x) = 0 
at a solution of F(x) = 0. However, formula (4-28) has also been used 
in the early stages of Newton’s process if two solutions of F(x) = 0 are 
very close together. In this case m was chosen in a heuristic fashion 
somewhere between 1 and 2 (see Forsythe [1958], p. 234). 


Problems 


31. Show that if the regula falsi is applied to the equation x* = 1, then, 
assuming that the errors d, = x, — 1 tend to zero, the stronger relation 


now Wy oy -- 1 2 


Ϊ | | Figure 4.11 | holds. 


90 elements of numerical analysis 


32. Give a closed formula for x, if the equation x? = 0 is solved (a) by the 
ordinary Newton’s method, (b) by the modification of Newton’s method 
given by (4-28), and discuss the result. (Assume xo = 1 in both cases.) 

33. Show that if the equation [F(x)]” = 0 is solved by (4-28), there results the 
ordinary Newton’s method for solving F(x) = 0. 


4.12 The Diagonal Aitken Procedure 


We continue to discuss the problem of achieving quadratic (i.e., 
Newton-like) convergence by methods not requiring the evaluation of any 
derivatives. We recall that none of the substitutes for Newton's method 
offered in the preceding section quite achieved quadratic convergence. 
Returning to equations written in the form 


(4-29) x = f(x) 


we now shall show that true quadratic convergence can be achieved 
without derivative evaluation by a modification of the Aitken acceleration 
procedure discussed in §4.4. 

This modification proceeds as follows. We start out, as we do in 

ordinary iteration, by choosing an initial value x) and calculating x, = 
f (Xo), χὰ = f(x). Aitken’s formula (4-11) is now applied to Xo, %1, Xe; 
yielding 
' (x1 — Xo)” 
(4-30) Na gees po, eee 
The number x’, = x is used as a new starting value for two more itera- 
tions. Having calculated χοῦ = f(xtP), χοῦ = (xr), we apply Aitken’s 
formula again, obtaining an accelerated value x, which in turn is used 
to start a new iteration, etc. If a denominator in Aitken’s formula 
happens to be zero, we set χϑ ὅτ = x, thus in effect terminating the 
iteration. Schematically, the algorithm is described by the following 
table: 


x{0) x 7 χῷ 

xo ie x a 

x xp x2) 
Scheme 4,12 


Since x, = (Χο), χὰ = f(%1) = f(f(%)), the values x‘**) can be 
thought of as being generated from x‘ ina single iteration step by means 
of a function F defined in terms of fas follows. We set 


N(x) = fF) — 2f) + x 


iteration 91 


and put 
Lf) — x} - 
(4-31) FS ae N(x) # 0, 
x, Ma) = 0. 


Thus we are led to the following formal statement of the new procedure: 


Algorithm 4.12 Choose x, and determine the sequence {x‘} 
recursively by 


(4-32) x**D — FO) =k = 0,1,2,..., 


where F is defined by equation (4-31). 


This algorithm is sometimes known as Steffensen’s iteration, as it was 
first proposed by Steffensen [1933]. 

| The questions arise whether Steffensen’s iteration is well defined, whether 
the sequence {x‘*’} converges to a solution of x = f(x), and whether the 
degree of convergence is higher than linear. We shall give affirmative 
answers under the following hypotheses: The equation x = f(x) has a 
solution x = s, and the function fis three times continuously differentiable 
in a neighborhood of x = s and satisfies f'(s) #4 1. It will be shown that 
these hypotheses imply that F satisfies F(s) = s, is twice continuously 
differentiable in a neighborhood of x = 5, and satisfies F’(s) = 0. 
Quadratic convergence of the sequence {x‘*’} for all x” sufficiently close 
to s then follows as a consequence of theorem 4.6. 

As direct differentiation of F turns out to be cumbersome, we introduce 
an auxiliary function g by setting 


fisth)-s 
g(h) = h 
FS), h=0. 


The function g is still at least twice continuously differentiable near h = 0. 
The definition of g implies 


h # 0, 


f(s + h) τὸ 5 + hgh), 
FFs + ἢ) =f + hgh) 
=s + hg(h)g(hg(h)). 


Ifx = 5 + ἃ, it follows that 


N(x) = ΟἹ) — 2f(%) + x = hG(h), 


Where 
G(h) = 1 — 2g(h) + g(h)g(hg(h)). 


92 elements of numerical analysis 

Like g, the function G is twice continuously differentiable near ἢ = 0; 
furthermore, since g(0) = /‘(s) τὸ 1, 

(4-33) G(0) = [g@) -- 115 49, 

showing that G(h) # 0 for [{| sufficiently small. Hence N(x) # 0 for 
x # s, |x — 5] sufficiently small, and by (4-31), 


[g(h) — 1) 
i on 


This representation of F also holds for ἢ = 0; it shows that F, too, is 
twice continuously differentiable, and that F(s) = s. Furthermore, 


F(x) =s+h— 


Fes — i 


ho 


Ἢ F(s +” — § 
7 [g(h) — 12 
= SS eee } 


[γὼ A 
striae 


Ι! 


by virtue of (4-33). 


EXAMPLE 

9, We apply Steffensen iteration to the equation x = e~* considered in 
example 1. Starting with x‘ = 0.5, we obtain the following values 
(arranged in the manner of scheme 4.12) 


0.500000 0.567624 0.567144 0.567143 
0.606531 0.566871 0.567143 
0.545239 0.567298 0.567143. 


The values in the top row are seen to converge very rapidly. 


Problems 


34. Apply Steffensen iteration to the solution of Kepler’s equation given in 
problem 5, where m = 1, E = 0.8. 
35. Apply algorithm 4.12 to the equation 


ee | 


starting with x = 3. Compare the sequence {x with the sequence 

obtained by Newton’s method for computing v'10. , 
36. Give a somewhat simpler proof of the quadratic convergence of Steffen- 

sen’s iteration by assuming that fcan be expanded in powers of ἢ = x — 5. 


iteration 93 


37. Show that if, in addition to the hypotheses stated above, f’(s) = 0, then 
also F's) = 0. Thus in this case Steffensen’s iteration converges at least 
cubically (see Householder [1953], p. 128). | 


4.13 <A Non-local Convergence Theorem for Steffensen’s Methodt 


As was the case with the corresponding result concerning Newton’s 
method, the convergence statement concerning Steffensen’s iteration 
proved in the last section is unsatisfactory because it guarantees conver- 
gence only for choices of x “sufficiently close” to the solution s We 
now shall prove a result which guarantees convergence no matter where the 
iteration 15 started. The extra hypothesis which will be added concerns 
the signs of f” and of f”. 


Theorem 4.15 Let / denote the semi-infinite interval (a, 0), and let the 
function f satisfy the following conditions: 


(7) fis defined and twice continuously differentiable on 1: 
(ii) f(x) > a, xeT: ᾿ 
Gi) [(χ) < 0, xed: 
ee) fix) > 0, xe. 


Then algorithm 4.12 defines a sequence which converges to the 


οὐ solution s of x = f(x) for any choice of the starting value 
in 1. 


Proof. By virtue of the hypotheses, the graph of the function f looks as 
indicated in figure 4,13 (where a = 0); it shows that the equation x = I(x) 
has a unique solution 5. | 


Figure 4,13 


T This section may be omitted at first reading, 


94 elements of numerical analysis 
Let x > 5. Then, by virtue of (iii), f(x) < 5, and f(f(x)) > 5. Hence, 
by application of the mean value theorem, 
f(x) — 5 = f(x) — f(s) = f(D -- 5) 
where s < ἢ, < x. Furthermore, 


SF) -- s =fF@) -— fo) 
= (Ὁ — 5) 


where f(x) < tg < s. We thus find 
f(x) — x =f) -—s—-@—3) 
= [f'(t) -- WG -- 5) 
SUX) -- f =f) -- 5 - "ὺ) = 5] 


= [f'(t) — UG) -- 9] 
= [κὼ — ΠΡωα -- 9). 


Thus the expression N in equation (4-31) is given by 
fF) -- 2f0) + x = fC) -- f@) -- ΝῸ) -- ΧΙ 
= {[f'"(t) -- ΠΡ — UG) — ΤΠ — 5). 
The expression inside the braces may be written 
[Γ( -- 15 + Ce) -- fC). 
This is positive, since by (iv), (12) < [({} and hence by (iii), 
[f'(te) —f'()IS) > 9. 


Hence N(x) # 0, and the first definition of F applies for all x > s. Since 
N is a continuous function of x, it follows that F is continuous at each 
point x > s. (Continuity at x = s has already been established in the 
preceding section under wider assumptions.) 

Using the above work, we find for x > s 


F(x) -s = (1 — Q)(x — 5), 


and 


where 
Ἢ ἘΝ τι. 
ies °- F@—-1F+@) -FO@ 


By the above, 0 < Q < 1, and it follows that 


0< Fix)-—s<x-s 
or 
5 =.F(x) < x. 


Thus for x = s, the sequence {x‘”} is decreasing and bounded below by 


J 


iteration 95 


5. It therefore must havea limitg 2 5. By the continuity of F, F(q) = q. 
This ts possible only forg = s. Thus the relation 


πη — 
ho ὦ 
has been proved for the case χί > s, 
a = 8. then all x” = s, and the result is trivial. Thus let now 
x « 5. In that situation, T(x) > s,fUf(x)) < 5. The above computations 
remain valid, with the difference that now 


x= = 8 Ss < lt. < f(x). 


We find again N(x) # 0; however, since the order of ft, and ἐς is now 
reversed, the numerator in (4-34) now exceeds the denominator, and we 
have Ὁ > 1. It follows that F(x) > 5. Thus if x < s, then x = 
F(x‘) > s, and the sequence {x‘"} decreases from Κα = 1 onward. Con- 
vergence to s now follows as in the case x” > s, 


EXAMPLES 


10. The hypotheses of theorem 4.13 are satisfied for f(x) = e~*, x > 0. 
Steffensen iteration thus will produce the solution of x = e~* for any 
choice of the starting value x > 0. 

il. Let c > 0, and put f(x) = c/x for x > 0. Again this function 
satisfies the hypotheses of theorem 4.13, and the Steffensen iteration 
converges to the solution s = Vc for any choice of x > 0. Since 


SFO) = x, the function F is given by 


c 


a 
ae 
FQ) = x - ~*~ = 5 (x + §) 
Σ σον ἜΣ x 


and the iteration function is identical with that given by Newton’s method, 


_ Problems 


38. Prove results analogous to theorem 4.13 for the cases 


(a) 0Ξ οὐ «ει, 


Pay > 0; 
(b) fix) > 1, ee 


Poy > 0. 


39, nec Ἵ ΙΩἹ 
39, εν that the function f(x) = A + B/x" satisfies the hypotheses of 
orem 4.13 for x = A » Ὁ, B>0. Hence find the solution s > 1 of 
the equation x® = x? + 1. 


96 elements of numerical analysis 


Recommended Reading 

The theory of iteration of functions of one variable is treated very 
thoroughly in the books by Householder [1953] and Ostrowski [1960]. 
Householder also gives numerous references to earlier work. 


Research Problem 

Find bounds (exhibiting the quadratic nature of the convergence) for 
the error of the approximations generated by Steffensen’s iteration. 
(Similar bounds are known for Newton's method, see for example, 
Ostrowski [1960].) 


chapter 5 iteration for systems of equations 


In this chapter we will show how to apply the basic iteration algorithm 
discussed in chapter 4 and its modifications to obtain solutions of systems 
of equations. The equations envisaged here are nonlinear. Inasmuch as 
linear equations are a special case of nonlinear ones, the algorithms 
discussed here are naturally applicable to linear systems. However, for 
the reasons mentioned in the preface, the many important algorithms 


_ that are especially designed for the solution of linear systems of equations 


a 


are not treated in this book. 


5.1 Notation 


The algorithms discussed in this chapter are, in principle, applicable to 


_ problems involving any number of equations and unknowns. However, 


for greater concreteness, and also in order to avoid cumbersome notation, 


Wwe shall consider explicitly only the case of two equations with two 
unknowns. These equations will usually be written in the form 


ΓᾺ x = f(x, y) 
e) y = g(x, y), 


where f and g are certain functions of the point (x, y) that are defined in 


Suitable regions of the plane. Each of the two equations x — f(x, y) = 0 


and y — g(x, y) = 0 defines, in general, a curve in the (x, y) plane. The 
Problem of solving the system of equations (5-1) is equivalent to the problem 
of finding the point or points of intersection of these curves. It will 
usually be assumed, and in some cases also proved, that such a point of 
Mtersection exists. Its coordinates will be denoted by s and ¢. The 
quantities s and ¢ then satisfy the relations 


5s = f(s, ἢ 
t = g(s, ἢ. 


98 elements of numerical analysis 


EXAMPLE 
1. Let f(x, y) = x? + y?, g(x, y) = x7 — y?. The system (4-1) reads in 
this case 

x= x+4+ ) 

y= x? = γ", 
The equation x? + y? — x = 0 defines a circle centered at (4,0), the 
equation x? — γῆ — y = ᾷ a hyperbola centered at (0, -- 2). Both the 
circle and the hyperbola pass through the origin, thus our system has 
the obvious solution s = f = 0. But an inspection of the graphs shows 
that there must be another solution near x = 0.8, y = 0.4. 


It will be convenient to employ vector notation. Thus we not only shall 
be able to simplify the writing of our equations, but also to aid the under- 
standing of the theoretical analysis and even the programming of our 
algorithms. We represent the coordinates of the point (x, y) by the 
column vector | 

= (3) 
a * 
J 


The functions fand g then become functions of the vector x, whose value 
for a particular x we denote by f(x) and g(x). If we denote by “ the 
column vector with components f and g, the system (5-1) can be written 
more simply as follows: 
(5-2) x = (Xx). 
The fact that the vector 

τῷ 


is a solution of equation (5-1) is expressed in the form 
s = f(s). 


A vector analog of the absolute value of a number (or scalar, as numbers 
are called in this context) will be required. Clearly, the /ength of a vector 
is such an analog. Ifx = (x, y), we write 


(5-3) ΙΧ]! = Vx? + y? 


The quantity ||x!| is called the Euclidean norm of the vector x. It is 
nonnegative, and zero only if x is the zero vector 0 = (0,0). Furthermore, 
if the sum of two vectors and the product of a vector and a scalar are 
defined in the natural way, we have 


(5-4) Ilex! 
(5-5) Ixy + Καὶ = [xl] + \|X2|. 


| 


|e] {|x| 


ΙΛ 


iteration for systems of equations 99 


Relation (5-5) is called the triangle inequality. If x, and x, are two 
yectors, then ||x, — x,|| 1s the distance of the points whose coordinates are 
the components of x, and Χο. 

We shall have occasion to consider sequences of vectors {x,}. Such a 
sequence is said to converge to a vector vy, if 


ix, — vil - ὁ for n— οὐ. 


The following criterion due to Cauchy is necessary and sufficient for the 
convergence of the sequence {x,,} to some vector v (see Buck [1956], p. 13). 
Given any number e > ὃ, there exists an integer N so that for alln > N 
and allm > N 


Xin au ἈΠ < ε, 


5.2 A Theorem on Contracting Maps 
Iteration in several variables is defined as in the case of one variable. 


Algorithm 5.2 Choose a vector Χο, and calculate the sequence of 

vectors {x,,} recursively by 

(5-6) X, = f(x,-1), ἢ ee Se A, ee 

As in the scalar situation, there arise the questions whether the sequence 

{x,} 1s well defined, whether it converges, and whether its limit necessarily 
is a solution of the equation 
(5-7) x = f(x). 
All these questions are answered by the following result: 


Theorem 5.2 Let αὶ denote the rectangular region a =.x < b, 

e = y Sd, and let the functions f and g satisfy the following 

conditions: 

(i) f and g are defined and continuous on R; 

(ii) For each x € R, the point (f(x), g(x)) also lies in ΚΑ; 

(iii) There exists a constant L < 1 such that for any two points x, 
and x, in R the following inequality holds: 

(5-8) If(x1) — £(x2)l| S Liix; — Xl. 

Then the following statements are true: 

(a) Equation (5-7) has precisely one solution s in R; 

(b) for any choice of x, in R, the sequence {x,} given by algorithm 5.2 
is defined and converges to s; 

(c) for any ἡ = 1, 2,..., the following inequality holds: 


| ae 
(5-9) IX, — sil S uk |X — Xoll. 


100 elements of numerical analysis iteration for systems of equations 101 


finally follows by letting m— co in (5-11). This completes the proof of 
theorem 5.2. No use has been made of the fact that the number of 
equations and unknowns is two; both the theorem and its proof remain 
valid in a Euclidean space of arbitrary dimension, and even in some 
infinite-dimensional spaces. 


] the fact that the mapping x — f(x) diminishes the distance between any 
| two points in R at least by the factor L. For this reason a mapping with 
Mi the property (5-8) is called a contraction mapping. 
nh il It should be noted that already statement (a) is not as trivial now as in 
| aii the case of one variable, where the existence of a solution could be inferred 
| from an inspection of the graph of the function f. Statement (Ὁ) guaran- Problems 
] tees convergence of the algorithm, while (c) gives an upper bound for the 
] error after n steps. 


| 
| | Condition (5-8) is again called a Lipschitz condition. It here expresses 
| 


1. Prove the relation (5-12) by means of (iii), without making use of the 


| 

ἢ 

᾿ | il continuity of fand g. 

| i ἡ ] Proof of theorem 5.2. The proof will be accomplished in several stages. 2. Prove that condition (iii) (even when L = 1) implies that both fand g are 
| | From condition (ii) it is evident that the sequence {x,} is defined, and that | continuous at every point of R. 

1]} i | its elements lie in R. Proceeding exactly as in the proof of (4-6) (§4.3) 

| | we now can show that (iii) implies 5.3 A Bound for the Lipschitz Constantt 

| (5-10) IXnea — χα! $ L'llx, — Xoll ROSE oes We consider here the problem of verifying whether a given pair of 


Now let 7 be a fixed positive integer, and let m > n. With a view towards 


functions (f, g) satisfies a Lipschitz condition of the form (5-8). The 


corresponding condition for a single function of one variable could be very 


applying the Cauchy criterion, we shall find a bound for |x, — X,l|. easily checked by computing the derivative (see $4.1). If the absolute 

Writing value of the derivative turned out to be bounded by a constant, then the 

i Mii gb Lauer ayer! Oo ae ἜΤ eS a, og. ie ene Lipschitz condition was satisfied with that constant. A similar criterion is 
Hy iil available for pairs of functions of two variables. : 


and applying the triangle inequality (5-5), we obtain 
Theorem 5.3 Let the functions f and g have continuous partial 


derivatives in the region R defined intheorem 5.2. Then the inequality 


and, using (5-10) to estimate each term on the right, | (5-8) holds with L = J, where 


Xin ae Xall = |Kn+4 oars Xall ti IXn+2 rT Xn +1 apes = ae Xin a pe 


(511 [Km — Xall 5 (1 + δὴ +--+ ΔΗ Oley — χοί oh) J= max Vii + fi + 82 + 8 
i | | . 
Spaz! -- Xoll | Proof. Let (xo, γο) and (x:, γι) be two points in R. We define for 
Osis 
by virtue of Ξ L < 1. Since the expression on the right doesnotdepend | X=Xotth γι Ξ γο + tk, 
on m and tends to zero as n -- ©, we have established that the sequence where 
{x,} satisfies the Cauchy criterion. It thus has a limit s. Since R is set h=*:-%, kK=¥1—Yo 


compact, sé Καὶ, 


We next show that 5 is a solution of (5-7). By virtue of the continuity Si =I Vt), δ8ι = 8% ). 


of f and g, It is to be proved that 

(5-12) | lim f(x,) = 6), (5-14) (fi — fo)? + (21 — 0)? S + k*)J?. 

and thus For the proof we shall require both the Schwarz inequality for sums, 
Het Ss. πον δ Mar tee oe (5-15) (ac + bd)? Ξ (a? + bc? + d?), 

as desired. 7 


Uniqueness of the solution s follows from the Lipschitz condition (iii), 
exactly as it does for one variable (see the end of §4.1). Relation (5-9) 


TA reader who is willing to accept the statement of Theorem 5.3 may omit the 
Temainder of the section without loss of continuity, 


102 elements of numerical analysis 


valid for any four real numbers a, ἢ, c, d, and the Schwarz inequality for 
integrals, 


6-16 (f reoaarar) sf [peor ax [00rd 


valid for any two functions p and q that are integrable on the interval 
[a,b]. Proofs of these well-known inequalities are sketched in the 
problems 3 and 4. 

By the chain rule of differentiation (see Taylor [1959], p. 590) we have 


Δι | 
dt 7 hf πεν i, 


where 
ds Ἐπ FAs Ve), Ἥ τ ΟΝ Vas 
and hence 
fi-fa= | We + Kf) at 
By (5-15), 
(hf, + kf,)? Sh? + ΚΟ + fr). 
Hence 


α- fa? se + | [ VT a. 


We now use (5-16) where a = 0, ὃ = 1, p(t) = 1, φί(ῇ = Gr + gee... it 
follows that 


1 
(f, — fo)? < (+ k) | (f2 + f2) dt. 
0 
In exactly the same manner we find 
1 
(. -- 5ὼὺ} 5 (δ +k) Ϊ (g2 + 92) dt. 


Adding the last two inequalities and using ΚΣ + f? + g2 + εἴ Ξ J*,we 
obtain (5-14). 


EXAMPLE 
2, Let 


f(x, y) = Asin x + Bcos γ, g(x, y) = Acosx — Bsiny 


where A and B are constants. We find 
fo + fe + eh + δἰ = AP + BP 
thus (5-8) holds with L = \/A2 + B2, The conditions of theorem 5.2 


aa 


ν᾿ 


" 


iteration for systems of equations 103 


thus are satisfied, for instance fora = —2, ὃ = 2, whenever A? + B? < 1. 
Convergence of the process for A= 0.7, B=0.2, χῃ = γ =O is 
illustrated by the values given in table 5.3. 


Table 5.3 
it Xn Fn 
1 0.200000 0.700000 
2 0.292037 0.557203 
3 0.371280 0.564599 
4 0.422937 0.545289 
28 0.526519 0.507921 
29 0.526521 0.507921 
30 0.526521 0.507920 
31 0.526522 0.507920 


Naturally, the condition of theorem 5.3 is not necessary for convergence 
of the iterative process. For instance for the problem considered in 
example 2 convergence also takes place for A = B = 0.71. 

Problems 
3. Prove (5-15) by observing that the equation for x, 
(a + cx)? + (6 + dx)? =0 
can have at most one real solution. 


4, Prove (5-16) by noting that the equation in x, 


[ (pit) + x9(0)? = 0 


can have at most one real solution. What are the conditions on p and g 
in order that there is a real solution? 


_ 5.4 Quadratic Convergence 


_ Suppose the functions fand g satisfy the conditions of theorem 5.2 and 


_ have continuous derivatives up to order 2 in R. The sequence of points 


t,, yn) defined by algorithm 5.2 then converges to a solution (s, ἢ) of the 
System (5-1). What is the behavior of the errors 


εἶ, = αὶ — 5, 
fg ΞΡ τ f; 


a8 --» οἵ 


104 elements of numerical analysis 


An application of Taylor’s theorem for functions of two variables (see 
Buck [1956], p. 200) shows that 


αἰ. = Xn+i — ϑ ἱ 
{χη Vn) ae f(s, f) 
I(s oe dn» t τ en) — f(s, t) 
= f(s, ἢ ἄς + FAS, Den + O(lidall”), 


Il 


and similarly 
δι: = Β. (5, ἢ ὦ, + gs, He, + O( |ld,||*). 


Here ἃ, denotes the vector of errors, 


ἃ, = (**), 
ἔῃ 


and O(|\d,!2) denotes a quantity bounded by C/d,||’. Introducing the 
Jacobian matrix of the functions f and g, 


ΠῚ i *), 


Ba By 
the above relations can be written in abbreviated form as follows: 
(5-17) di+i — J(s, t) 4, sa O(|\d,,||). 


Relation (5-17) is the multidimensional generalization of (4-8). If 
J(s, ἢ τὸ 0 (that is, if the elements of the matrix J(s, ft) are not all zero), it 
shows that at each step of the iteration the error vector is approximately 
multiplied by a constant matrix. In this sense we again may speak of 
linear convergence. If J(s, ἢ = 0 (that is, if all four elements of J are 
zero), then we see that the norm of the error at the (7 + 1)s¢ step is of the 
order of the square of the norm of the error at the nth step. This is 
similar to what earlier has been called quadratic convergence. 

The following analog of theorem 4.6 holds in the case where J(s, 1) = 0: 


Theorem 5.4 Let the functions f and g be defined in a region R, and 
let them satisfy the following conditions: 
(i) The first partial derivatives of f and g exist and are continuous 


in Κ. 
(ii) The system (5-1) has a solution (5, ἢ) in the interior of R such that 
J(s, ἢ = 0. 


Then there exists a number d > Ὁ such that algorithm 5.2 converges 
to (5, ἢ) for any choice of the starting point within the distance d of 
the solution. 

The conclusion can be expressed by saying that the algorithm always 
converges if the starting vector is “sufficiently close” to the solution 


iteration for systems of equations 105 


vector. ‘The proof of theorem 5.4 is based on the fact that by virtue of 
the continuity of the first partial derivatives 


Vfit+fiteit¢e=L<i 


in a certain neighborhood of (5, ἡ Ε R. It then follows from theorem 5.3 
that the conditions of theorem 5.2 are satisfied in that neighborhood. 
Further details are omitted. 


5.5 Newton’s Method for Systems of Equations 


We now shall consider the problem of solving systems of two equations 
with two unknowns which are of the form 


(5-18) F(x, y) = 0, 
G(x, y) = 0, 


where both functions F and G are defined and twice continuously differ- 
entiable on a certain rectangle R of the (x, y) plane. We suppose that the 
system (5-18) has a solution (5, ¢) in the interior of R, and that the Jacobian 
determinant 


F(x, y) F(x, y) | 
G,(x, y) G(x, y) 


is different from zero when (x, y) = (5, ἢ. It then follows from a theorem 
of calculus (see Buck [1956], p. 216) that the system (5-18) has no solution 
other than (5, ἢ) in a certain neighborhood of the point (s, ἢ). 

Newton’s method for solving a single equation F(x) = Ὁ could be 
understood as arising from replacing F(x + 8) by F(x) + F(x) and 
solving the equation 


(5-19) D(x, y) = 


F(x) + F’(x)8 = 0 


for 6, thus obtaining a supposedly better approximation x + ὃ tos than x. 
We now apply the same principle to the system (5-18). Assuming that 
(x, ») is a point ‘near’ the desired solution (s, ἢ), we replace the function 
F(x + 5, » + e) by its first degree Taylor polynomial at the point (x, y), 
that is, by F(x, y) + F.(x, y)8 + F(x, y)e. A similar replacement is 
made for the function G(x + 5, y + «). Setting the Taylor polynomials 
qual to zero, we obtain a system of two linear equations for ὃ and e, 


(5-20) FAX, γ)δ + F(x, ye = —F(x, y), 
G(X, y) + G,(x, ye = —G(x, y). 


I 


The determinant of this system is just the quantity D(x, y) defined by 
(S-19). Since D is continuous and D(s, t) 4 0 by hypothesis, it follows 


106 elements of numerical analysis 


that D(x, y) # 0 for all points (x, y) sufficiently close to (5, ἢ. Thus for 
all these (x,y) the system (5-20) has a unique solution ὃ = 8&(x, y), 
e = e(x,y). The point (x + 8(x, y), y + e(%, y)) 15 now chosen as the next 
approximation to (5, ἢ). Algorithmically speaking, the procedure can be 
described as follows: 
Algorithm 5.5 (Newton’s method for two variables.) Choose 
(xo, 9); and determine the sequence of points (x,, ¥n) by 


Xa+1 = ἔχω Vn) 
ΤΡ = Βίχ,; Yu), 


where the functions f and g are defined by 
fx,y=x+ 8,7), gy) =y + , y), 


and where ὃ and ε denote the solution of the linear system (5-20). 


(5-21) ReaD. Eas 


We assert that the sequence (x,, y,) converges quadratically to (5, ἢ 
for all (xo, Yo) sufficiently close to (5, ἢ). In order to verify this statement, 
it suffices by theorem 5.4 to show that all elements of the J acobian matrix 
of the iteration functions f and g, 


fAx,Y) Sy ¥) 
J(x, y= . 
Sx(X,V) By(X, ¥) 
are zero for (x, y) = (s, ἢ. Omitting arguments, we have 
fen=1 4+ 6, Ty = ὃ» 
he {- πο δν = L + ey 


The values of the derivatives ὃς, 5,, ex, ἐν are best determined from (5-20) 
by implicit differentiation. Differentiating the first of these equations, we 
get ; 

ΡΒ + Fede + Faye + Fyex = — ἔρ, 
F458 + Feby + Fye + Fyey = - Εἰ. 


Two similar relations involving the function G are obtained by differentiat- 
ing the second relation. We now set (x,y) = (s, ἢ and observe that 
δία, ἢ) = εἰς, ἢ = 0. We thus obtain for (x, y) = (s, ἢ 


F,6, + Pye: = 
G0, ss Gye = —G;, 
F,8, + Fyey = —Fy 
G6, + Gyey = —G,. 


| 
| 
Ἶ 


The determinant of each of these two systems of linear equations is again 


iteration for systems of equations 107 


the Jacobian determinant D(s, ἢ and hence is different from zero. We 
now easily find 
ὃ, = ὕ; 


According to equation (5-22), this implies that all elements of the matrix J 
are zero at the point (5, t), as desired. 

For actual execution of algorithm 5.5 (although not for the above 
theoretical analysis) it is necessary at each step of the iteration to solve 
the system (5-20) for ὃ and ε. An application of Cramer’s rulet yields 


pe 
sa [Ὁ Ὁ _ Gr, — FG, 
ffi) Τα, -- FA, 
G, ἃ, 
Poo =F 
a be τ FG, -- GF, 


EXAMPLE 


3. We solve the system considered in example 2 by writing it in the form 
x — Asinx — Bos y Ξε ἢ, 
y—Acosx + Bsiny = 0, 


and applying Newton’s method. The following values are obtained for 
fo 0.7, B = 0.2, χορ = γ = 0: 


i Xn Yn 

1 0.6666667 0.5833333 
2 0.5362400 0.5088490 
3 0,5265620 0.5079319 
4 0.5265226 ᾿ 0.5079197 
5 0.5265226 0.5079197 


_ The much greater rapidity of convergence is clearly evident. 
—— - . eT ad 
Τ It is well known that Cramer’s rule should never be used for the solution of linear 
Systems of any sizable order. The solution of such systems is much more con- 
| Yeniently found by a process of elimination. For systems of smal! order (such as 
2 or 3) Cramer’s rule is perfectly applicable, however. 


108 elements of numerical analysis iteration for systems of equations 109 


Problems | | ; | Multiplying the polynomials on the right side of (5-23) and comparing 
5, Generalizing the approach outlined in §4.7, one might try to generalize coefficients of like powers of x, we find the following conditions on 
Newton’s method to systems by suitably choosing functions ἢ and k so Do, 91, -- +> By: 
that iteration of the functions | bo = Ap 
fx, y) = x + Ax, YF») | δι = a, + ubp, 
g(x,y) = » + k(x, y)G(X, y) by = dg + ub; + wbo, 
yields a quadratically convergent process. Show that this is not possible and generally form = 2,...,N -- 2 
in general. ΟΝ . 
_ Determine the matrix H = H(x) such that application of iteration to (5-24) by = Ay + UD,-1 + vby-2. 
f(x) = x + ΗΘ ΕΘ) | By comparing the coefficients of x* and x° it is seen that (5-24) also holds 
} | | ΠΝ πε Mink he τῶν τὸς forn = N—Jlandn=N. Furthermore, if we agree to put b_, = δ. 
yields : a a ee BO: τᾶν aS Ie = 0, the relation holds also for n = 0 and n= 1. Thus we find the 
Seip : ene: ἘΝ ἣν τὰ 04 of ἐς system following algorithm for determining the coefficients bo, b,,..., by in the 
REISE Sie EE seta nae TF | _ representation (5-23) of the polynomial p: 


ee | 
᾿ a τ + 2 x, | Algorithm 5.6 If ao, @,,...,@y, u, U are given numbers, determine 
: bo, 6;,..-, δὲ. from the recurrence relation ὃ... = b_, = 0, 
using Newton’s method. 
. Solve by Newton’s method the pair of equations 
x + 13logx — y? = 0, Clearly, the coefficients bo, by, ..., by thus obtained are functions of the 
ae = ay ΞΡ ΞΟ, | -yariables uw and v. The reason for considering the b,, is contained in the 
following theorem: 


by = ὦ, + ub,_, + vb,~o, n= Ὁ fcc 


starting with xo = 3.4, yo = 2.2. 
> mene ἘΠΕ RISE Theorem 5.6 The polynomial x? — ux — v is a quadratic factor of 
4x® — 27xy? + 25 = 0, __ the real polynomial p(x) = dox” + a,x"~* -Ὁ....Ὁ dy if and only if 
eam is alii. | by-1 = by = 0. 


by Newton’s method, starting with xo = Yo = ἢ. Proof. (a) If by. = by = 0, then (5-23) reduces to the relation 


5.6 The Determination of Quadratic Factors P(x) = (x? — ux — v)q(x). 

In this section we shall prepare the ground for the application of | ‘Tt shows that a zero of χΞ — ux — v is also a zero of p. By considering 
Newton’s method to the problem of finding pairs of complex conjugate the derivative we find that a double zero of x? — ux — v is also a double 
zeros of polynomials with real coefficients. We first consider the following zero of p. Thus x? — ux — v is a quadratic factor of p. 
preliminary problem. | _ (b) We now assume that x? — ux — v is a quadratic factor. Denoting 

Given a polynomial p of degree N 2 2 with real coefficients, Its zeros by z;, Za, we first assume that z, τέ z>. We then have from (5-23), 
‘Since 22 — uz, — v = 0 (k = 1, 2) 


and a quadratic polynomial x* — ux — v, to determine constants | 0 = plz.) = by-1(Z, — u) + dy, ΠΈΣΕ ἢ, 
bo, b,, sey by such that the identity . Or, written more explicitly, 


(5-23) px) = GP — ax τ v)g(x) + δν.- τὰ -- u) + by | (21 — u)by_1 + by = 0, 
(Zz, oat u)by 1 + by Ξ--- ἢ, 


p(x) = ἀρχὴ + αἰχὴ Ὁ... Ἔ an, 


holds, where 
g(x) = BoxN-? + ByxN~8 +--+ + by-2 4his homogeneous system of two linear equations has the determinant 


110 elements of numerical analysis 


Z, — Zo τὰ Ὁ, hence its only solution is ὃν... = by = 0. If the two zeros 
of the quadratic factor x? — ux — v coincide, then p(z,) = p'(z,) = 9, 
hence from (5-23) 

by-1(z; — u) + by = 9, 

δι... Ξε 0, 


and it follows again that ὃν... = by = 0. 

Theorem 5.6 shows that the problem of determining a quadratic factor 
of the polynomial p is equivalent to the problem of determining u and v 
such that 


by —1(u, v) = 0 
byt, v) = 0. 


The solution of this pair of simultaneous equations by Newton’s method 
is known as Bairstow’s method. 


(5-25) 


5.7 Bairstow’s Method 
In order to apply Newton’s method to the system (5-25) we require the 
partial derivatives 


ἐστι 1 fs 1 


(u, ὃ), (u, ὃ), 
iy (u,»), Ἢ (4,0). 


We shall obtain recurrence relations for these derivatives by differentiating 
the recurrence relations (5-24). 

We first differentiate with respect to wu and observe that ¢b)/cu = 0. 
Writing 
Obn +1 


(5-26) ¢ = “ΑΞ 


for notational convenience, we obtain 


Co = Do, 

Sam b, + πῦρ, 

Co = δὰ + uc, + veo, 
and generally 

Cy = By + UCyu~1 Ἕ Vln-2 
where n = 0, 1,2,...,N — 1, €-2 = σι = 0. We see that the c, are 
generated from the 5, exactly as the b, were generated from the @,. From 
(5-26) we have 
Oby 4 eb 


--ὦ = = ¢y-1 
Ou Te bu | 


| (5-28) ee byey—3 — δὴν. τὸν. 2 


iteration for systems of equations 111 


We next differentiate with respect to v. We observe that now 
ab,/cv = éb,/ev = 0. Thus, writing 


Dna 
(5-27) d= ἜΓ- 
we obtain d_, = d_, = 0, 
dy = bo, 
d, = δὶ + ud, 


d, = by + ud; + tvdo, 
and generally 
εἶ, =e δι τ Udy —1 + vd, — 2; 
n=0,1,2,...,.N— 2. These recurrence relations and initial conditions 


are exactly the same as those for the c,, hence we have d, = c,, ἢ = 
0, 1, 2,..., N — 2, and from (5-27) we obtain 


If the increments of u and v are denoted by ὃ and e, respectively, their 
values as determined by Newton’s method must satisfy 
Cy—20 + ον. 48 = —by_y, 
Cy-18 + Cy-2e = —by 
and hence are given by 
Ene τόν -ἰ στ byey - Qe 
Οὗ. — Cy-1Cn-—3 Ch-2 — Cy-1CN-3 
The whole procedure is summarized in 
Algorithm 5.7 Given the polynomial 
P(X) = ἄχ" + ayxN~* + agxN~? +-+++ ay 
and an arbitrary tentative quadratic factor x? — ugx — vo, determine 
a sequence {x* — u,x — v,} of quadratic factors as follows: For each 
k =0,1,2,... determine the sequence {b,} = {b*} from ἢ... = 
ῥ.. ΞΞ 0, 
by = Gn + Upby—1 + Ugbpn—o5 Ἡ -- Ὁ Sa sf 
and the sequence {c,} = {{{9} from c_z = c_,; = 0, 
δι. ΞΞ δὶ oP Utes Ἔ ϑμδ,... ἤ Ἐξ St Coie ee 


Then set v.41; = up + δ, ὅκα = Uy + e, Where ὃ and ε are given by 
(5-28). ᾿ 


112 elements of numerical analysis 


Schematically the algorithm can be described as follows: 


ao Bo Co 
ay v by u U Les. ἡ 
ag ——> βοὴ ———> co 
ὧν —3 by-s Cy-3 
ay --ὦ by-o Cy-2 
Ay —1 Dby-1 ν- 
Bay by 

Scheme 5.8 


The coefficients that are needed in formula (5-28) are printed in bold face. 


EXAMPLE 
4, To determine a quadratic factor of 


p(x) = x? — x? + 2x +5 


near x2 — 2x + 5. Algorithm 5.7 yields for ug = 2, '9 = —5 (watch the 
signs!) 


Ἢ an by Ch 

0 1 | 1 ow | 
fuel 1 3 δ + 36 = 2 

2 oer ey 0 δ᾽ = O.11111111 
3 es ε = 0,66666667 


Continuing with 4, = vp + 6 = 2,11111111, νὴ =o te = — 4,33333333, 
we find 


n An b,, ΠΝ 
{Ws Ι 3,222222225 
= —(,01234568 
1  —1  DALII1I11  3.22222222 ~=—s- 2.48 1481485 + 3.22222222¢ 
= —0,2112483 


5 = 0.02170139 
e = 0.08227238 


2 2 0001234568 2.48148148 


3 5 0.21124830 


yielding uw, = 2.1328125, rz = —4.41560571. Proceeding in a similar 
manner, we find 


us = 2.13249371, 
ug = 2.13249369, 


ts = —4.41503560, 
tg = —4.41503564. 


iteration for systems of equations 113 


Calculating the 4, with the last values, we obtain 


A ay by, 

0 1 1.00000000 
i “| eeaaeses 
2 2 0.00000000 
3 5  0,00000000 


indicating that convergence has been achieved. The two complex zeros 
are thus found to be 


ΠΝ 
σι τ 5 + i -(5) -ν 
= 1,06624685 + 71.81056712. 
Problems 
10. Determine all zeros of the polynomial 
p(x) = x* — 8x® + 39x? — 62x + 51 


using the fact that 


p(x) = (x2 — 2x + 2)(x? — 6x + 25) + 1. 


11. Determine quadratic factors of the polynomial 
p(x) = 3x® + 9x5 + 9x4 + 5χ3 Ὁ 3x? + 8x + 5 
near x? + 1.541451x + 1.487398 and x? — 1.127178x + 0.831492. Also 
determine the real zeros near —1.86 and —0.72. 
5.8 Convergence of Bairstow’s Method 


In order to justify the application of Newton’s method to the problem 
of solving (5-25) we have to show, according to §5.5, that the Jacobian 
determinant 


ΕΣ τ 
fi Cu ou 
(G-29) Diu, t) = 


chy : aby 
eu (u, v) “ev: (u, v) 


is different from zero for (u, v) = (5, ἢ), where x? — sx — 1 is a quadratic 
factor of the polynomial p. One condition under which this is the case is 
as follows. 


114 elements of numerical analysis 


Theorem5.8 Let x? — sx — ¢ bea quadratic factor of the polynomial 
p, and let its zeros z,, z, be two distinct, simple zeros of p. Then 
D(s, ἢ # 0. 


Proof. We differentiate the identity (5-23) with respect to both wu and v. 
Since p does not depend on u or v, the result is 


es ee ee es (δ _ by 
0 = —xg(x) + (x ux — v) δὴ + Bu (x — ὦ — ὃν. + Ou 
Gq(x) , eby-1 | éby, 
-Ξ — oS aa ᾿ς || ; = pes SLL 
0 = —gq(x) + (x ux — Vv) Ap + ap (x —u) + ap 


Setting here u=s, v=t, x =z (kK = 1,2) we obtain by virtue of 
Ζῇ — sz, — t = 0 (k = 1, 2) the four relations 


Oby ob 
(5-30) = (z, — 5) + ἘΠ = Ζμῆς (Κ = 1, 2), 
by ὃΡ ἜΝ 
(5-31) a : (2), <a 5) + om = Gr (k ἘΞ ἴ, 2), 


where g;, = g(z;), and where the derivatives are taken at (wu, v) = (5, ἢ. 
The two relations (5-30) can be regarded as a system of two linear equa- 
tions for the two unknowns ¢by_,/éu and éb,/éu with the nonvanishing 


determinant z, — Ζ. We thus have 
Ody ~1 ae 2141 — 2242. Oby = 2 Zo(Go — 41) + s(2141 = Z9qa). 
ou Zi — Zo Cu 21 — Zo 
Solving the system (5-31) in a similar manner, we find 
= ee by _ 2142 -- 2241 + δίφι — ga) 
Gv Zi — 2g av 21 — Ζὰ 


Oby-1 _ 


Some algebraic manipulation now yields 


Oby —1 Oby —1 
D( Os mae (z,)q(zs) 
5. ἢ) = | = ida = G(21)q(Z2). 
) aby aby Giga = Gl2i)d\22 
Cu ov 


Since z, and z, are both simple zeros of p, we have q(z,) # Ὁ, k = 1, 2, 
and the conclusion of the theorem follows. 

The theory given in §5.5 now shows that under the hypotheses of 
theorem 5.8, algorithm 5.7 actually defines a sequence of quadratic poly- 
nomials {x2 — u,x — v,} which converges to x* — sx — ¢ whenever the 
initial quadratic x? — upx — Up is sufficiently close to x* — sx — 1. 


iteration for systems of equations 115 


For most polynomials even crude approximations to quadratic factors 
are hard to obtain by mere inspection. In chapter 8 we shall discuss a 
method that will automatically produce good first approximations to 
quadratic factors (with complex zeros) of almost any real polynomial. 


5.9 Steffensen’s Iteration for Systemst 


We now return to ordinary iteration as applied to systems of equations 
(algorithm 5.2). Our goal is to extend the Aitken-Steffensen formula 
(4-11) to systems of equations. As explained in §4.4, the rationale behind 
that formula consisted in neglecting the small term ες in (4-8). Proceeding 
in the same vein, we now assume heuristically that the asymptotic error 
formula (5-17) is true without the term denoted by 0((|d,]||*). 

Denoting by {x,} a sequence of vectors generated by algorithm 5.2 and 
by s a solution of x = f(x), our assumption implies that 


(5-32) Xn+1 — 5 = J(x, — 5), 
for n = 0,1,2,..., where J = J(s) denotes the Jacobian matrix of the 
function f taken at the solution s. The problem is to determine s from 
several consecutive iterates X,, X,+1, Xn+2,---, notwithstanding the fact 
that J is unknown. 

Subtracting two consecutive equations (5-32) from each other, we find, 
using the symbol 4 to denote forward differences, 
(5-33) AS. ἄς, ri, | ae ΙΝ 


We define X,, to be the matrix{ with the columns x, and x, 41, 
Xn An 
X, = (Xn Xn+1) = ( | "). 
Yn Yn+ 1 
Defining 4X,, in the obvious way, (5-33) shows that 
JAX, = 4X, 41, y= HU Soe. 
If the matrix AX, is nonsingular, we can solve for J, finding 
(5-34) J = AX, 4 Χ.}. 
We now solve (5-32) for s. Assuming that I — J is nonsingular 
(I = unit matrix), we get 
{ -- SIs = Xnia — IX 
= (I — J)x, + Ax, 


? This section may be omitted without loss of continuity. | 
Ὁ Here we use implicitly the fact that the vectors considered have two components. 
In the case of N components we would have to put 


xX, = (Χ,. Ἀπ α1ν. sy Xn+n~-1)- 


116 elements of numerical analysis 


and hence 
s=x, + (Il -- J)7* 4x,. 


Using equation (5-34) and applying the matrix identity (AB)~* = B~*A7+ 
we have, always proceeding in a purely formal manner, 


I 


(I — ὃ = [ — AX, ,(4X,)~4)™ 
((AX, — 4X,.,)(4X,)~*)~* 


—AX,(42X,)~? 


Ι 


I 


and hence finally 
s = x, — 4X,(4°X,)~* 4x,. 


This formula for the exact solution has been derived under the assump- 
tion that (5-17) is true without the O(/|d,/*) term. If this term is present, 
we still may hope that the vector 


(5-35) x, = x, — AX,(4°X,)-1 ἄχ, 


is closer to s than x,, provided that the matrix A*X,, is nonsingular. It 
will be noted that the formula (5-35) is built like (4-11), with the difference 
that certain scalars are now replaced by appropriately defined vectors and 
matrices. 

Formula (5-35) can be used in either of two ways. Either we can apply 
it to a sequence of vectors {x,} already constructed to obtain a sequence 
{x'} which presumably converges to s faster. More effectively, the 
formula can be used as in algorithm 4.12, as follows. 


Algorithm 5.9 Choose a vector x, and construct the sequence of 
vectors {x} as follows: For each k = 0,1,2,..., set Xp = x, 
calculate x,, Xo, Xs, {rom 


Xn+1 = f(x,), n = 0, |, 2, 


and let x**+? = x, where xj is defined by (5-35). 


This algorithm has not yet been fully investigated from the theoretical 
point of view. As it stands, it is not even fully defined, since there is no 
indication of what is to be done if the matrix 4?Xp is singular. It is 
therefore impossible to prove that the algorithm converges, let alone that 
it converges quadratically. Substantial experimental evidence, and also 
some theoretical considerations, seem to indicate, however, that the 
algorithm is indeed quadratically convergent in a large number of cases, 
even when ordinary iteration diverges. 


iteration for systems of equations 117 
EXAMPLE 
5. To find the solution of the system 
x= χα + γ᾽, 
y= x — y, 
near (0.8, 0.4). Algorithm 5.9 yields the following values: 


Table 5.9 
k x 759 Xn Vn 
0 0.8 0.4 
0.8000000 0.4800000 
0.8070400 0.4096000 
0.9253683 0.5898240 
1 0.7741243 0.4194303 
0.7751902 0.4233468 
0.7801424 0.4216974 
0.7864508 0.4307934 
2 0.7718671 0.4196500 
0.7718850 0.4196728 
0.7719317 0.4196813 
0.7720109 0.4197462 
3 0.7718445 0.4196434. 
0.7718445 0.4196434 


The fact that x, = x, for k = 3 indicates that convergence has been 
accomplished. It is evident in this example that the sequences {x,} would 
not converge, as they begin to diverge for each k already for small values 


of 7. 


Problems 


12. Assuming (5-32) to be exact, show that applying Aitken’s formula (4-11) 
individually to each component of x does, in general, not produce the 
correct solution vector s. 


13*, Assuming (5-32) to be exact, show that the matrix A?X, is singular if and 


only if at least one of the following situations obtains: 


(ἡ χρ Sp 
(ii) x, — 5 15 an eigenvector of J; 
(iii) the matrix J is similar to a matrix cI, where c 15 real. 


Show that in the cases (ii) and (iii) the procedure described in problem 12 
is effective. 

14. Find the solution of the system considered in example 2 by means of 
algorithm 5,9, 


118 elements of numerical analysis 


Recommended Reading 

The abstract background of the method of iteration is discussed in most 
textbooks on functional analysis; see, for example, Liusternik and 
Sobolev [1961], p. 27. A beautiful discussion of Newton’s method for , 6 Η αἰ ; 
systems of equations is given by Kantorovich [1948] (see also Henrici chapter linear difference equations 
[1962], pp. 367-371). 


Research Problems 


1. Formulate Newton’s method for systems of equations with an 
arbitrary number of unknowns. 

2. Generalize Bairstow’s method to extract from a polynomial of degree 
N a factor of arbitrary degree n < N. 

3. Discuss Newton’s method in the “singular” case when the determinant 
(5-19) is zero at the solution (s, ἢ). 

4. Discuss experimentally the stability of algorithm 5.9 when the matrix 
A?X,, is nearly singular. 

5. Develop a theory of iteration for systems if the vectors x, are formed 
according to 


Linear difference equations were first encountered in §3.3, where the main 
‘topic was the study of linear difference equations of the first order. In 
the present chapter, we shall consider linear difference equations of 
| αὐ order. The theory of such difference equations is required for 
2 understanding of some important algorithms to be discussed in the 
che hapters 7 and 8. Furthermore, linear difference equations are a useful 
Xne1 τ [{χ, Vn)s tool in the study of many other processes of numerical analysis; see in 
Ya+1 = (χη... Ya) : particular the chapters 14 and 16. Much unnecessary complication is 


(i.e., the most recent information is used at each step). What is the : ‘oided by considering difference equations with complex coeflicients and 
correct formulation of the 45 procedure in this case? | Solutions. 


6.1 Notation 
_ We recall that a linear difference equation of order N has the form 
e” GonXn + AinXn-1 bee + AynXn-w = Oy. 


sre {9.7}, {@i.n},.--, {ay.n} and {b,} are given sequences, and {x,} is a 
‘s sequence to be ἀξίου θη, The difference equation (6-1) is called 
: homogeneous if {b,} is the zero sequence, i.e., if all its elements are zero. 
Be noveh much of the subsequent theory can easily be extended to the 
general case, we shall be concerned exclusively with the case of linear 
difference equations with constant coefficients. In this case, a, = ας for 
all values of ἢ and for certain (real or complex) constants do, @;,..., Gy. 
αἱ Is not required that 5, = ῥ, however.) Without loss of generality we 

may assume that dy) # 0, ay # 0, for otherwise the order of the difference 
auaton could be reduced. Dividing through by a, and renaming the 
ot and the elements of the sequence {b,}, the linear difference 
uation with constant coefficients appears in the form 


R My Ἔ ἄιχ,.α + GoXp—-2g Ἔν. + GyXn_y = O,, 
Ghere Gy τ ὃ, 


120 elements of numerical analysis 


We can aid the understanding by two notational simplifications. First, 
we shall no longer refer to a sequence by explicitly exhibiting its elements, 
as in the symbol {x,}, but instead denote a sequence by a single capital 
letter. The nth element of a sequence X will be denoted by x,, or some- 
times also by (X),. Secondly, if ¥ = {x,} is any sequence, we shall denote 
by & X the sequence whose nth element is given by 


(LX), = Xy + AXq-1 1 GgXyn-g T+ * + AyXn-y- 


With these notations, the problem of solving the difference equation (6-2) 
is the same as the problem of finding a sequence YX such that 


(6-3) LX = B, 


where B denotes the sequence of the nonhomogeneous terms ἢ. 
We note that the operator # defined above is a /inear operator. Defin- 
ing the product aX of a scalar a and of a sequence X by 


(AX), = A(X)n 
and the sum of two sequences X and Y by 
(Χ + Y)n ΞΗ (X)n τ 84" 
we have for arbitrary scalars a and b and for any two sequences X¥ and Y 
(6-4) L(aX + bY) =aFL¥X + b£Y. 

The comprehension of the above notation may be helped by considering 
analogous simplifications in the theory of functions. Writing α΄ in place 
of {x,} is much like writing fin place of f(x) to denote a function. The 
operator ¥ plays a role similar to that, say, of the differentiation operator 
D, which associates with a function fits derivative Df = f’. Relation (6-4) 


is the analog of the familiar fact that differentiation is a linear operation, 
ie., that D(af + bg) = aDf + bDg. 


6.2 Particular Solutions of the Homogeneous Equation of Order Two 


We shall consider in some detail the case N = 2. Here we have, for 
some constants a, and a, τέ 0, 


(6-5) (LX)n = Xn + GXn-1 + A2Xn-2- 


Our first task is to find solutions of the homogeneous equation #X = 0.7 


+ Here the symbol 0 does not denote the scalar zero, but rather the sequence whose 
elements are all zero. Since no misunderstanding is possible, we shall not attempt 
to make a graphical distinction between the two concepts. 


Be 
ae 


linear difference equations 121 


Let us take a look at the corresponding problem in the theory of 
differential equations. The linear homogeneous differential equation of 
order 2 with constant coefficients, 


dt 


has, for suitably chosen r, solutions of the form e™. Is the same true also 
for the difference equation 


x" + ax + aox = 0 (x = "ἢ 


(6-6) Ra Ἔ ἄιχ,-.« Ἔ foto = ΟἿ 


Replacing rf by the discrete variable n, we are tempted to seek solutions of 
the form x, = e™ = (e’)", or, putting e’ = z, of the form x, = 2”. With 
this definition of X = {x,} we have 


(2X), = 2° + αι} + αν: 3 
ss ge aig? ob ae + as). 
This expression is Ὁ not only if z = 0—which would yield the trivial 
solution of the difference equation—but also if z is a zero of the polynomial 


p defined by 


(6-7) D(z) = 27 + az + ap. 


This polynomial is called the characteristic polynomial of the difference 
equation (6-6). We know from the theory of equations that p has exactly 
two zeros, z, and z,, which may be real or complex, and which may 
coincide. Ifz, # Zs, then the sequences with elements z? and z% represent 
two distinct, nonzero solutions of # X¥ = 0. | 

We can also find two distinct solutions if z; = z >. We recall that if z, 
is a zero of multiplicity >1 of p, then by theorem 2.6 not only p(z,) = 0 
but also p'(z,) = 0. This suggests finding a second solution by differentia- 
tion. If (X), = z", we have identically in z 


(2X), = 2" az" Ὁ agz"-- 
= 2"~*p(z). 
Differentiating with respect to z, we get 
nz™~* + a(n — 1)2 3 + δοίη — 2)2 5 8 
| = (n — 2)z"~8p(z) + 2"~2p'(z). 
For z = z, the expression on the right is zero for all n, showing that the 


s€quence with mth element ΗΖ} is also a solution. We thus have 
obtained: 


Theorem 6.2 Let a, and ad. # Ὁ be constants, and consider the 
difference equation “X = 0, where & is defined by (6-5). If z,, ze 


122 elements of numerical analysis 
are two distinct zeros of the characteristic polynomial, then the two 
sequences αὐ"), X defined by 
(Xn = 21, A") q = Ζῇ 
are solutions of £X = 0. If z, is a zero of multiplicity 2, then the 
sequences defined by 
(X), = ζῇ, (X), = nzt-} 
are solutions. 
EXAMPLES 
1. We consider the difference equation 2X, = Xn-1 + Xn-2. The 
characteristic polynomial p(z) = z? — z — Ἰ has the two zeros 


[1- νϑ _l-v5 


πὰ LT a 


thus.two solutions are given by 


(X¥), = ι΄. (χῶν, = ( = 


2. Consider x, — 2X,-1 + Xn-2 = 0. The characteristic polynomial 
is p(z) = 22 — 22 +1; z, = 1 is a double zero. Thus we find the two 
solutions 

ΕΑ ΞΕ = 


as can be verified directly. 


(X), = nl"-1 =n, 


6.3 The General Solution 


Throughout this section, # will denote the difference operator defined 
by (6-5), where a, # 0. The first tool is the following: 


Lemma 6.3a If ¥@ and Y¥@ are any two solutions of #X = 0, and 
if c, and ¢, are any two constants, then the sequence 


XK =e, X¥% χῷ 


is also a solution. 


Proof. By the linearity of the operator %, 


LX = L(e,X + cgX 
= 6, LX” + Co LX = 0. 


As an exercise in the application of lemma 6.3a, we consider the problem 


linear difference equations 123 
of finding real solutions of real difference equations. Let ¥ be defined 
as above, where a, and a, are real. The characteristic polynomial 

P(z) = Σ΄ + az + a, 


although a polynomial with real coefficients, may still have nonreal zeros 
21, Za: However, these zeros according to theorem 2.5d then are complex 
conjugate, L.e., 


23 ἘΞ Z15 


or, denoting by Re a and Ima the real and imaginary parts of a complex 
number a, 
Rez, = Re zs, Hii zs = — [τῇ Ζ;. 
If the two zeros are nonreal, two distinct solutions are given by 
(X¥®), = 2%, (X®), = (z,)". 


Since the product of complex conjugate numbers is equal to the complex 
‘conjugate number of the product, we can also write 


(X™), = (2%). 
It now follows from lemma 6.38 that the two real sequences Y™, γὼ) 


defined by 
(YO), = 4[zt + 27] = Rect, 
τὶ av Lipa dy 
(Y®), = 5 [zt 7] = Im σῇ, 


are likewise solutions of the difference equation. 


EXAMPLE 


3. Let —1 < ¢ < 1, and consider the difference equation 


(6-8) B= 2g Ayo =. 
The characteristic polynomial p(z) = z? — 212 + 1 has the zeros 
mZ=tt+ivl—-*, wz=t—-—iVvl—#. 
If we define the angle m by the condition 
cosy =f (0 -ῳ « m) 


then the zeros appear in the form 


44 
29 


cos @ + ising = εἷς, 
cos p — ising = e7', 


124 elements of numerical analysis 


Consequently, the two solutions given by theorem 6.2 appear in the form 


(xX), ἘΞ (εἰὉ.}» ue gue 
(Xx). — [6 ἢ} == gore. 


and are complex. However, in view of e'® = cos np + isin ng, also the 
sequences defined by 


(¥™),, = cos ne, (CF). = Sib gp 


are solutions. This may be verified directly by using the addition 
theorems of the trigonometric functions. 

It is readily seen that the elements of the sequence Y‘” are polynomials 
of degree n in the variable t = cosg. In fact, (Y‘)) = 1, (Y™), = 
and (6-8) shows that if this property holds for the integersm — 2andn — 1, 
it holds for the integer». Conventionally, these polynomials are denoted 
by Τ᾽ (ἢ) and are called Chebyshev polynomials of degree n. By the above 
we can also write 

T,(t) = cos (# arc cos ἢ). 


The Chebyshev polynomials are important in many branches of numerical 
analysis (see §9.4). | 


We return to the problem of finding a// solutions of “5.1 =0. Our 
second tool is the following simple observation. 


Lemma 6.3b Let J be aset of consecutive integers, and let the integers 
mand m-—1 bein 1. Let the sequence B = {b,} be defined on 1. 
Then the difference equation ΦΧ = Β has precisely one solution 
which assumes given values form = mandn = m — 1. 


Proof. Let X¥ and X® be two solutions having the same values at 
n=mandn=m-—1. Thesequence ἢ = ¥ — X = {d,} then isa 
solution of #X = Ὁ assuming the values 0 for ἢ = m andn=m — 1. 
If ¥ and X™ are not identical, then εἶ, # Ὁ for some ἢ > m or some 
n<m-—1. To fix ideas, let d, 4 0 for some n > m. If we denote by 
n the smallest integer for which this is the case, then, because D is a 
solution, 
d, + ddy—1 + Aed,-2 = 0. 


However, since d,_, = d,-2 = 0, d, # 0, this equation reveals a con- 
tradiction. The case where some εἶ, τὰ 0 with n < m — | is dealt with 
similarly, making use of the fact that ag #0. It follows that D is the 
zero sequence, and hence that the sequences Δ 2) and X are identical. 
The zero sequence is a solution of #&X = 0 with the property that its 
elements are zero for any two consecutive values of ἡ. Lemma 6.3b 


linear difference equations 125 


shows that this property characterizes the zero solution. In other 
words: {Γ a solution of &X = 0 vanishes at two consecutive integers, it 
vanishes identically. 

Let now X*? = {χ ἢ) and X¥@ = {χ be two known solutions of 
#X = 0, defined for -- οὐ <n < οὐ, and let ¥ = {x,} be an arbitrary 
third solution. Is it possible to find constants ΟΣ and δ. such that 


X= eX” + eX? 


If that is the case, then we are entitled to call the sequence ¢,X¥™ + co¥ 09 
the general solution of “X = 0, for any special solution can be obtained 
by assigning special values to the constants c, and co. 

By lemma 6.3a, the sequence c, X‘’ + c.X@ is in any case a solution of 
LX =0. By lemma 6.3b, this solution is identical with ¥ if it agrees 
with X at two consecutive integers, ἢ and ἢ — 1, say. In order to obtain 


_ this agreement, we must be able to determine the constants c, and cy such 


that the two equations 


(6-9) { C1Xn” τ CoXn” = Xn» 


1 [ 
itn ye + Cotes = Χο... 


are satisfied. By the theory of linear equations, this system has a solution 
for arbitrary x, and x,,_, if its determinant 


(1) (2) 
3 x5 
n = 


(1) (2) 
Xn-1 An—1 | 


is different from zero. 


The determinant w, is called the Wronskian determinant of the sequences 


X™ and X at the pointn. We have obtained 


Theorem 6.3 Let ¥ and X™ be two particular solutions of 
“XX -- 06, Then every solution of & Y¥ = 0 can be expressed in the 
form ¢c,X™ + ¢c,X¥@ if and only if the Wronskian determinant w,, 
of Α΄) and X™ is different from zero for all values of ἢ. 


The requirement that w, γέ Ὁ for αἱ n is less stringent than it appears, 


45 will be seen presently. 
EXAMPLES 

4. Let the two zeros Zz, and z, of the characteristic polynomial p of 
~£X = 0 be different. We calculate the Wronskian of the two solutions 


At) — 


Sy? em Σ΄; x = χὰ 


‘given by theorem 6.2. We find 


1 


it Ἠ 
Ζ1 Zo 
W, = 


Leer pert 7 (2122)"~ “(21 = 25). 


126 elements of numerical analysis linear difference equations 127 


Evidently w, # 0 for all. Hence the general solution of the difference _ The initial conditions yield the following conditions on οἰ and cy: 


equation is given by ¢,Z] + C225. | . a eee, 
5, If z, is a zero of multiplicity 2 of the characteristic polynomial, two 7 i 
solutions of the difference equation are given by ᾿ ( eA 5) -1 = ( rd 3) a ἃ 
i 2 2 eS ξϑῳυ, 
xa = Zi, χη = ἡΖτ΄΄. 
We find We easily find the solution 
[2 ma oes ee: | Piers ts 
"ΝΕ [5 (5 - ΤΕΥ ἢ | 2V5 : 25 
hence thus the solution of our initial value problem is given by the formula 
Xq = ΟἹ + conei—* , δ ἢ 
; ᾿ | l 1+ V5\"*1 1— V5\""2 
is the general solution. ee ἘΞ ele ςΣ-. τς (— | 
6. As a somewhat more special example, we consider the real solutions , 
of the difference equation | It is not immediately evident from this representation that the numbers x, 
en ee MR Gee ae ee all are integers, The sequence thus defined is called the Fibonacci 


: ‘sequence. It : i ing number- ical : i 
found in example 3. If tf = cos φ, we were able to express these solutions pa Hae (any Telersstne ἈΠΕΊΘΟΤΠΟΘΈΘΕΙ͂ΟΣΣ properties, 


δ... 4.1 ον iE cies oo. x2) = sin no. . tions, == 
in the form x = cos ap, x = sinnp. For these solutions | Problems 
COs Np sin Np 
| ΒΤ 1. Express in real terms a general solution οἵ the following difference 
cos(n — l)p sin(n — 1)φ | equations: 
= co in (n — 1)p — n — l)p sinn | 
cos ng τς 1ὴφ — cos (xn -- l)p φ | (a) Xn — Xn-2 = 0; 
= —sin # 0. (b) Xn + Xn-2 = 0; 
Thus the general solution can be expressed in the form (c) Begs Di hy. ἃ, twee Ξε ἣν 
C1 COS NM + ὦ Sin ng. | 2. Find a general solution of the difference equation 
(Note, however, the restriction on 1.) | ie Bie oe HL SG. 


3. Let a, b be real, a? — 4b < 0. Show that the general solution of the 


e are now in a position to solve arbitrary initial value problems for 
wy ᾿ ‘J P difference equation 


the linear difference equation #X¥ = 0. All we have to do is to find two 
particular solutions Δ), X® with nonvanishing Wronskian and to BES EO EE Be PG 
determine the constants c, and c, such that the sequence X¥ = c,X"’ | ΝΠ be expressed in the form 

+ c,X™ satisfies the given initial conditions. | 


᾿ 


X, = γζοι cos np + ὃ Sin Ηφ), 
EXAMPLE ᾿ς Where r and ¢ are real. 
7. Let us find the solution of | 4. Let X = {x,} denote the Fibonacci sequence. Show that 

Xn = Xn-1 T Xn-2 | lim ἄπει 1 ἜΣ v5 
satisfying the conditions χα. = 0, χὸ = 1. By the examples | and 4, q | | noo Xp 2 
the general solution of the difference equation is >". Let 6, c be real. Show that a necessary and sufficient condition in order 

ἐς δι that a/l solutions of the differ uation 
wer φᾶν ence equation 
Xy = | — -’Ί + Col — >] ' 


5) 2 An, + BOX. 4 Ἔ CXy 9 -- 0 


128 elements of numerical analysis 


Figure 6.3 


tend to zero for n—> οὐ is as follows: The point (4, c) lies in the interior 
of the triangular region of the (6, c) plane bounded by the straight lines 
ΟἸΞΞῚ 2b-—l=e —2b-l=c 

(see Fig. 6.3). (Hint: Treat separately the two cases where the zeros of 
the characteristic polynomial are real or imaginary.) 

6*. Formulate and prove results analogous to those of §6.3 for linear difference 
equations with variable coefficients. 

7. Solve the following initial value problems for difference equations: 

(a) Xp = 3xn-1—-Xn-e = =— Xo HL OM = 2; 
(b) Xn — 2Xn-1 + 2Xn-2 = 0, Xo = Χι = 1, 


6.4 Linear Dependence 

The results of §6.3 can be formulated somewhat more elegantly by 
introducing the concepts of linear dependence and linear independence 
of sequences. Two sequences X¥“ and X defined on a set / of integers 
(but not necessarily solutions of a difference equation such as #X = 0) 
are called /inearly dependent if there exist two constants ¢,, ¢2, not both 
zero, such that the sequence 

X= oA + cox 

is the zero sequence on 1. 
called Jinearly independent. 
EXAMPLES 
8. Let J be the set of integers from | to 20 and let 


BG a 2 eS (1 SY: SR ιν Ὁ] 
10 elements 10 elements 


If no such constants exist, the sequences are 


linear difference equations 129 


BOP pet. Sr Oo Ag As ough? 
The sequences X and X“™ are independent. 
al + Cox® = 0 


form =1,...,20. Forl Sn S 10 this implies c, = 0, for 11 S mn Ξ 20 
it implies cg = 0. Thus αχῷ + ¢,X™ τῷ Ὁ is possible only for 
Pe te = ἢ, . 

9. If one of the sequences Δ΄) and X is the zero sequence, the two 
sequences are always linearly dependent. 


For assume that 


To find out whether two given sequences are independent may be 
difficult in general., However, if the two sequences are both solutions of 
the same linear difference equation “XY = 0, their linear dependence or 
independence is closely connected with their Wronskian determinants. 

Theorem 6.4 Let XX“ and X™ be two solutions of # XY = 0, and 

let W = {w,} be the sequence of their Wronskian determinants. 

(ἡ If X™ and X™ are linearly dependent, then W is the zero 
sequence. | 

(ii) If w,, = 0 for some integer m, then the two solutions are linearly 
dependent. 


An immediate consequence of theorem 6.4 is the following 


Corollary 6.4 If w,, τί 0 for some integer m, then w, τῇ Ὁ for all ἢ 
and the solutions ¥“ and X¥™ are linearly independent. 


Proof of theorem 6.4. (i) If X‘ and X™ are linearly dependent, then 


there exist, by definition, two constants c, and cg, not both zero, so that 


CX” + eX? = 0. 


In particular, for an arbitrary integer ἢ, 


ox ἘΝ CoXn” = 0, 
1) = 
Oxy + cox. = 


This is a homogeneous linear system of two equations with the nontrivial 


as (c,, Co). Its determinant must therefore vanish, showing that 
= 0, as required. 
ἫΝ Let m be an integer such that w,, = 0. The ὙΦ ΚΤ τι system of 
two linear equations with two unknowns, 
ΟΧ + cox = 0, 
ake ΜΝ + Paks = Q, 
then has a nontrivial solution (c,, c,). We define the sequence 


X= Ok? + toX¥™, 


130 elements of numerical analysis 


By lemma 6.38, X is a solution of “X= 0. By construction, this 
solution is zero at the two consecutive points ἢ = m and ἢ =m — lI. 
Hence, by lemma 6.3b, X is the zero sequence. It follows that X™ and 
X™ are linearly dependent. 

If A and B are any two sequences, and if c,, co are any two constants, 
the sequence c,A + cB is called a linear combination of the sequences A 
and B. Using the concept of linear dependence, the content of theorem 
6.3 can now be phrased more elegantly as follows: Every solution of 
LX = 0 can be expressed as a linear combination of two fixed, linearly 
independent solutions. 


Problems 


8. Show that if two sequences are linearly dependent, one is a constant 
multiple of the other. 

9. Let W = {w,} be the sequence of Wronskian determinants of two 
solutions of ΦΧ = 0. Show that W satisfies the first order difference 
equation 

Wy = GoWn-1- 


Hence give an independent proof of the fact that either W is the zero 
sequence, or w, # Ὁ for all n. 

10. Let X = {x,} be a solution of “#X = 0 such that x, # Ὁ for all ἡ. 
Show that any solution Y = {y,} of the linear difference equation of 
order 1 with variable coefficients, 


An 
Xn -1 


Pi = Va-1 7 ag 
satisfies «ΟὟ -- 0, and that the solutions X and Y are linearly 
independent. 

11*. Extend the results of §6.4 to linear difference equations with variable 
coefficients. 


6.5 The Non-homogeneous Equation of Order Two 


We now shall discuss the solution of the equation “X = B or, more 
explicitly, 
(6-10) Xy + GyXyn-1 TF AgX_-2 = θη» 
where the sequence B = {b,} is defined on some set J of consecutive 


integers. Much of the corresponding theory for differential equations 
again carries over. For instance we have the following result. 


Theorem 6.5 Let Υ be a special solution of «Ὁ = B, and let Χ Ὁ 
and ¥) be two linearly independent solutions of the corresponding 


linear difference equations 131 


homogeneous equation #X¥ = 0. Thenevery solution X of #X = B 
can be expressed in the form 
i Cx + re. Gad + Y, 
where ¢,, C2 are suitable constants. 
Proof. By hypothesis, if Y = {y,} 
(6-1 1) Vn + AVn-1 + G2Yn-2 = δὰ 
for 811 μὲ}. If X = {x,} is any solution of (6-10), we obtain by sub- 
tracting (6-11) from (6-10) and setting D = {d,} = X — Y, 
d, T Gidy 1 + dedy—o = ᾧ, 
Thus, the sequence D is a solution of the homogeneous equation. By 
theorem 6.3 it can be written in the form 
D= eX + ΧΟ" 
for suitable c,, cz. Since X¥ = Y + D, the desired result follows. 
Theorem 6.5 reduces the problem of finding the general solution of the 
nonhomogeneous equation to the problem of finding one particular 
solution of it. As in the case of differential equations, such a particular 
solution can frequently be found by an inspired guess. For instance, if 
the elements of B are bg", where b and g # Ὁ are constants, a special 
solution of & ¥ = B can frequently be found in the form x, = aq”, where 
ais a constant to be determined. Substituting into (6-10) yields 
a(q" + a,g"~* + aaq"”*) = bq", 


or, denoting by p the characteristic polynomial of the homogeneous 
equation and dividing by gq", 


| aq~“p(q) = ὃ. 
If p(g) # 0, we find 
1a. 
pq) 


the method breaks down, however, if g is a zero of the characteristic 
polynomial. 

Similarly, if the elements of B are polynomials in n, there often exist 
Solutions X whose elements are polynomials of the same degree in ἢ. 
These solutions can be found by the method of undetermined coefficients. 
Again the method may break down in exceptional cases. 


EXAMPLE 


10. To find a particular solution of 


An — Xn-1 τ *n-2 = -ῆ. 


i] 
ἢ ἢ} 132 elements of numerical analysis __ | linear difference equations 133 


} | We set x, = an? + bn +c, with constants a, b, ¢ to be determined. 6.6 "Variation of Constants 

| ] at 4 4 ΗΜ ͵ "5 . * * | 

| i | | Substitution yields j The methods discussed in §6.5 for finding a particular solution of the 
1 i | aln? — (n — 1)? — ΟἹ — 225] + dfn —-(@@ —1) -- ἃ -- 2) | nonhomogeneous equation are, at best, heuristic and furnish a solution 

i H +ce[1 — 1 — 1] = —2? | only inspecialcases. With differential equations, the method of variation 
᾿ ᾿ I or, after simplification, | { of constants permits us to find a solution of any nonhomogeneous equation 
ull ] | if the general solution of the homogeneous equation is known (see 


meh a(—n? + 6n — 5)  ῥί-- + 3) + c(—1) = —n?. | Coddington [1962], p. 67). A similar formula exists for difference 
* . . 8 ‘ a , " 
} | Comparing’ coefficients of like powers of n, we get 7 equations. For simplicity we shall consider the case in which the sequence 

“ἽΝ Ι B is defined on the set of nonnegative integers. 

“ἢ eae = —l, 

| | Aub ak ie Ι Theorem 6.6 Let X = {x} and ¥@ = {x@} be two linearly 
| | ae See 0. independent solutions of the homogeneous equation “XY = 0, and 
" | | | let W = {w,} be the sequence of their Wronskian determinants. If 
wi | yielding a = 1,b = 6,c = 13. Thus we have found the solution B = {b,} is defined on the set of nonnegative integers, a solution X of 
1 | yee orer ens £ X = Bis given by 
" ee 

WW Problems n | (1) (2) 
I | : | (6-12) x= > pees De Hes (2 ΠῚ 
1 i 12. Find general solutions of the difference equations m=0 Wm 
ἢ ἢ} i (a) Xn — 5X_-1 + 6%n—2 = 2: Proof. Formula (6-12) can be verified by direct computation. A different 
MH! | | | (b) oXy -- ὅχ,-1 Ἔ Mag ΞΞ 2"; proof, which shows how the formula is obtained, is as follows. 
| | ] (c) Xn — Xn-1 — 2Xn-2 = πῇ, It is clear that, for a given difference operator #, the elements x, of the 
wie required solution are /inear functions of the preceding elements of the 


a eS sequence B. We may thus write 


14, Determine a general solution of the difference equation 


(6-13) Xn = > An, εἶδον 
m=0 


| 
Hil 13. Find the solution of equation (a) above that satisfies the initial condition 
aks aa BN Sa ἘΞ χη. τὴν 3 = 0. 


What relation must hold between the values x, and x, of a particular 


_ Where the coefficients d, ,, depend only on ¥ and not on the particular 
| ? solution {x,} of the above equation in order that |x,| S C for some 


sequence B. The requirement that the sequence ¥ = {x,} defined by 


constant C and for all n > 0? (6-13) satisfies “X = B leads to 


] 15. Difference equations have remarkable applications in the theory of 
economics. Let.y, be the national income in the year ἢ, a the marginal 
| propensity to consume, and δ the ratio of private investment to increase 
in consumption. Assuming that government expenditure is constant 


n n=] π--ἰ 

| Xn + QX_-4 4 a3X,-2 = Ἂ dn, Om + ay BS An — 1, mOm + dg > Ano, nm 

m=0 πὰ Ξ ἢ Ι 
and equal to 1, a certain economic theory states that the following differ- 
| 
| 
| 
| 


m= 0 
-- Dns 


ence equation for y, holds (see 5, Goldberg [1958], p. 6): Or, collecting factors of δ... 


Pa = GVg sa Ἔ αίγ,.- — γκ.--α) Ἔ 1. —— 
a (dm τ rc Se ἢ + (οί, 2, m)Om 


Ἔ Oe Ue Cie et nad + dn. nOn = by 


This identity must hold for all ἡ = Ὁ, no matter how the sequence B is 
Chosen. It follows that for each m, the coefficients of b,, on both sides 


(a) Solve this equation with ἃ = 0.5, δ = 1 under the initial condition 


Visas = 2, Visa7 = 3. 


Hi (b) How frequent are depressions in this economy? 
| (c) For what values of a, 6 is the economy thus described noninflationary ἢ 
| 


134 elements of numerical analysis linear difference equations 135 
Their Wronskian determinants are, by example 5, w, = —1. Formula 


m =n. This leads to the relations (6-13) thus yields in view of b, = 1 (w = 0, 1, 2,...) 


| 
| a Γ 5 
must be equal. On the right, there is a nonzero coefficient only for 
| 


| (6-14a) iin = ids - 1, πὶ + Aed - ὁ, m = 0, i> fir L, π 11 H nt 
| | ' (6-14b) dn n-1 t+ @1dn-1.n-1 = Ὁ, | “oo pa fi aka = 2 (t + 1 = 7H). 
| (6-14c) dnn = 1, i Changing the index of summation from m to p, where p = n + 1 — m, 
᾿ ͵ Formula (6-14a) shows that for a fixed value of m, the sequence D = {dp, m} — we have | 
Ὶ Ϊ is a solution of #D =O forn > μι ἘΠῚ. Equation (6-14c) yields a first | | ee ΜΝ 
| 11 initial condition dy, = 1. If we impose the further initial condition a ἘΞ ee τε δ Ὁ 2 Ὁ 3. τὴ et 1} β 
| | | | dim-—1.m = 0, then (6-14b) can be replaced by (6-14a), where n 2 m+ 1. | ΚΝ | | 
We Hi The sequence D is thus characterized as solving #D = 0 for n > m and By a well-known summation formula we obtain 

| | | satisfying the initial conditions | | (n + 1)\(n + 2) 

Ἵ al Xn ἘΠῚ ee SE eee 
ΠῚ (6-15) yA es ne, eee | 2 
\ 1 By theorem 6.3 there must exist constants cj”, cj”, depending only on m, We leave it to the reader to verify that this indeed is a solution of (6-17). 
| ] | , eee d = (YD 1 (2) (2) Ε 

i | n,m Cn ἀπ Cm “n+ } Problems 


ΠΝ ἢ , «sys Tt ee 7 ‘op “ 
| | The initial conditions (6-15) are satisfied if re ee ee ee ee 


ΙΝ . CY) 4 cA) — |, . of the difference equation 


ih (1) (1) i. ες 
| | Ι Cin and + Cm Mint 0. Xn — (a + b)Xn-1 + αδχ,ᾳ--α = a". 
\ 
| 


The solution of this system of linear equations is readily found to be _ | 
| y 4 Y (Distinguish the cases a = b and a # b.) 


2) (1) . , : Hee - 
ma ἢ}: eo) = Xia een LS te 17*. Show that formula (6-13) is valid for linear difference equations with 
| | ἢ δὶ Wn - Win | variable coefficients. 

Ι We thus find 


Wn 
The theory developed in the sections §6.3 through §6.6 for difference 


1) Substituting this into (6-13), we obtain (6-12). as | | , : 
6 ene ‘sii equations of order two carries over, without essential change, to equations 


| Although theorem 6.6 does provide a solution to the nonhomogeneous 


| 
I) | (6-16) a al Xe tnt 55 Xn Xa ᾿ς 6.7 The Linear Difference Equation of Order NV 
I ; ie | 
| of arbitrary order. We now consider the difference operator # of order | 
| 


equation that works in all cases, there is of course no guarantee that the 


| | By example 2, two linearly independent solutions of the homogeneous 


equation are given by (6-20) LX = B, 


ΠῚ ! 
] . . : : : é N 1 by 
i | Ὶ sum appearing in (6-13) can be expressed in any simple form. (Neither e defined by 
Wail I) can the integrals appearing in the variation-of-constants formula for (6-18) rc OT Seas are ee an 
Wa differential equations always be expressed in closed form.) But formula — 
| (6-13) can be effective in cases where heuristic methods fail. vs @,,4,...,@y are arbitrary (real or complex) constants, ay # 0. 
ΠῚ: We are interested in both t! nogeneou ion 
| | ] EXAMPLE | in both the homogeneous equation 
| | | 11. To find a particular solution of (6-19) PX = 0. 
""}}}} 
] | ταῖς 
ἑ Ι] (6-17) Ky — 2Xqny + Χακ.-α5: 1. Where 0 denotes the null sequence, and the nonhomogeneous equation 
| 
| 


xO 1, χῷ --ἢ, Where B denotes an arbitrary given sequence. 


136 elements of numerical analysis 


The following analogs of the results given above hold and are proved 
similarly. 


Lemma 6.3a' Any linear combination of two solutions of ΦΧ = 0 
is again a solution. | 


Lemma 6.3b' Let 1 be a set of consecutive integers, let the integers 
m,m —1,...,m το N + 1 bein J, and let the sequence B be defined 
on /. Then the difference equation “#X = B has precisely one 
solution which assumes given values form = m,m—1,...,m—N+1. 


Again we have the corollary that if a solution of &Y = 0 vanishes at NV 
consecutive integers, it vanishes identically. 

Let now the sequences 4°? == ἐχ KO = ff XO = {χ be 
N solutions of #X = 0. Their Wronskian determinant at the point n is 
defined by 


1 {8} 
xi } a a xe) 
1) yal 2) ANY 
Xn 4 And see Mya 
Ww, = 
1) 2) (N) 
[*n=-N+41 Ne es a ae Xn oN +1 | 


The following result is obtained as in $6.3: 


Theorem 6.3 Let X™,..., ΧΟ be N solutions of &X¥ = 0. Then 
every solution of “X = Ὁ can be expressed as a linear combination of 
X)..., X% if and only if the Wronskian determinant of these 
solutions is different from zero for all values of n. 


N sequences A‘, A™,..., A™ are called linearly dependent if there 
exist constants ¢;, Co,..., Cy, not all zero, such that 


CAD + c5AM +---4 eyA™ 


is the zero sequence. If no such constants exist, the sequences are called 
linearly independent. As in §6.4 we can show: 


Theorem 6.4 Let X¥,..., X“” be N solutions of ΦΧ = 0, and let 

W = {w,} be the sequence of their Wronskian determinants. 

(i) If the solutions X¥%,..., ¥@? are linearly dependent, then W is 
the zero sequence. 

(ii) If W contains a zero element, then the solutions X¥‘,..., ΑΝ 
are linearly dependent. 


It follows that if w,, # Ὁ for some integer m, then w, τέ Ὁ for all ἡ. 
Thus the condition of theorem 6.3 is satisfied if the solutions X'?,..., X°” 
are linearly independent, and it follows that every solution of 2X = Ὁ can 


linear difference equations 137 


be expressed as a linear combination of a system of linearly independent 
solutions. 

We now turn to the nonhomogeneous equation # XY = B. As above, 
we have 


Theorem 6.5’ Let Y be a special solution of “Y = B, and let 
X™,..., X be a system of linearly independent solutions of the 
corresponding homogeneous equation # Y = 0. Then every solution 
X of ΦΧ = Bcan be expressed in the form 


X= OXY +--+ + cyX ΕΟ 
where ¢,,..., Cy are suitable constants. 


Particular solutions of the nonhomogeneous equation can frequently 
be found by guessing the general form of the solution and determining the 
parameters such that the equation is satisfied. A more generally applicable 
method is given by 


᾿ Theorem 6.6' Let ΧΑ = {x?},...,X¥% = {x} be Ν᾽ linearly 
4 independent solutions of the homogeneous equation ¥ XY = 0, and 
' let W = {w,} be the sequence of their Wronskian determinants. If 


B = {δ} is defined on the set of nonnegative integers, then a solution 
X of &X = Bis given by 


F 
era! 2 ND 
a Bete x(2) A. x ) 
> (1) (2) (N) 
t Nim —1 Nin —1 ras Xm —1 
hh. iL [1 ἐ Ν | 
6 ae Ae Mics deat Am-N+1  *""* Xm N41 b 
( -21) x, = - - nie 
| m=0 Wm 
Problem 


18*. Let W denote the sequence of the Wronskian determinants of N solutions 


| of & X¥ = 0. Find a linear difference equation of order 1 satisfied by W. 


6.8 A System of Linearly Independent Solutions 


To complete the discussion of the linear difference equation of order N 
With constant coefficients, we need to determine a system of N linearly 
independent solutions of the homogeneous equation “X = 0. Special 
Solutions of “&X¥ = 0 can again be found by means of the characteristic 
Polynomial | 

plz) = 2% + ay2"—* ++ -* + ay 


ΠῚ 
Wl 
| | | 138 elements of numerical analysis linear difference equations 139 


associated with the operator #. If z is a zero of the characteristic poly- 
i ἢ nomial, it is easy to see that the sequence Δ΄ = {x,} defined by x, = z" isa 
1 solution of # X = 0, for we have 


By the Leibnitz rule for differentiating a product (see Kaplan [1953], 
p. 19), 


π- ὦ -. [rt Ὁ ΡΝ wat = ἘΠ wae n-N- mp, 
} (LX), = Xn Ayer bee + GyXa-w (z*~*p(z)” = (5) N)\(n — N —1)...€2 —- N -- m + 1)z"*-*~ "p(z) 


| = gt bt gz) eee + ἀν: ἢ 


(ii) General case: The polynomial p has zeros of multiplicity >1. ἔν, a ὯΝ ihe proof of the fact that the 2) solutions givea By (0.25) are 
linearly independent. 


W|I 

an 

| Ἰ 
Ι = gh az) ’ + [ΠΟ Ξ-- Ν). : .(n — N—m+ 2 Te ree) 
͵ ; 

| | | sek Ε eee 

ΙΠ] We now distinguish two cases. | 
| | | (i) The polynomial p has N distinct zeros z,,2Z2,...,Zy. Then the .o A. εν 
ay sequences ΧΟ = {z®}(k = 1,2,..., N) represent Nsolutionsof YX =0. | ἢ 
| Ϊ Their Wronskian αἱ ἢ = N — Ἰ is ᾿ς which vanishes for all n, by (6-22). 

"ἢ β egos ἘΠΕ | We are thus led to the following analog of theorem 6.2: 
Wh Ζ: 2g ZN 2 

] | pie a ee τ: Ε Theorem 6.8 Let the characteristic polynomial of the Nth order 
| Ε : ‘ ; equation YX = 0 have the distinct zeros z;, Z2,..., Z, (K Ξ N), and 

Bay ae let the multiplicity of z, be m +1 ({m,=N-—k). Then the 

| Wi ἢ | | 

! | 1 l Aine ἢ | sequences defined by 
| ] ] ' This determinant is known to have the value | (6-23) X, = πίῃ — 1)... «(ἡ — m+ 1)z?-", 

᾿ ᾿ ] j sate ἢ * ἜΣ 
ἢ | LT] Gm — 2n) # 0. ΤΠ 0, Rise cette I beepiay kk 
Li i form a system of WN linearly independent solutions of #X = 0. 

| | It follows that the solutions found are linearly independent. "" 

Mn 

. 


The above method does not furnish sufficiently many solutions. How- 


| 
| : i ever, further solutions can be found by differentiation. Let z be a zero of _ EXAMPLES 
| hl multiplicity k + 1 where k > 0. We then have ΕΙΣ Let 
] | (6-22) p(z) = p'(2) =-+ = p) = 0. ἰνῶν, γῶν, a Mane 
| i] | It was shown above that the sequence ¥ = {z"} is a solution. We now The characteristic polynomial 
il assert that also the sequences ¥™, X¥®,..., X defined by p(z) = τ' — 2227 +1 = (2 + )5( -- 1)? 
Sue Deere n-1 : - Le 
ΤΠ me ve ὑπ - ten? has the two distinct zeros z; = 1, zz = —1, each with multiplicity 2. 
ἢ] 1]} tn | adi Theorem 6.8 thus yields the four solutions 
η x = nn —1)...Q2—k + 1)χη τ», ai, en P=(-DL Ho =D. 
i 7 Ἢ 
| " are solutions. Indeed, if0 = m = k, then 13. We shall show: If z is a zero of multiplicity 4 of the characteristic 
My | (FX™), = x + ax, fees + ἀν χίο polynomial, then the sequence Y = {n®z"} is a solution. Proof: We seek 
| | | = πίη -- 1)... «(ἡ το m+ 1)z"-" to represent X as a linear combination of the solutions given by theorem | 
| | + a,(n Ai Iu ivy 2y, } «(ἢ a pig et 6.8. We have | 
| + n(n — 1)(n — 2) = "3 — 3n? 4+ 2n 


| + a(n — N\n — N—1)...€2— N—m + 1)z*-"-™ n(n — 1) = ye as 
= (ΣΡ π᾿, n= n 


140 elements of numerical analysis 


and hence 
n? = n(n — 1)(n — 2) + 3n(n — 1) + ἢ. 
Thus 


nz” = Σ [πίη — 1) — 2)z"-9] + 32z7[n(n — 1)z"-7] + z[nz"-*], 
which is a combination of the desired form. 
Example 13 leads us to conjecture the following fact: 


Corollary 6.8 Under the hypotheses of theorem 6.8 a set of N 
independent solutions of # X = 0 is also given by the sequences 


(6-24) x, = ΗἾΖΙ, ἐπ a eer δι, 


The proof that the sequences ¥ defined by (6-24) are solutions boils 
down to showing that m™ can be written as a linear combination of 


Hi Ξὸ Οὐ 1 ie Bes 


n, n(n — 1),...,n(2 - 1). (ἡ - μι + 1). 


with coefficients that are independent of n. This is readily verified by an 
induction argument. The proof that the N sequences given by (6-24) are 
linearly independent is again omitted. 


EXAMPLE 
14, To find the general solution of the difference equation | 
N  (N ἐν 
Xn τ (| Vari τ {πε Tyas (=1)"(y ae = 0. 


The characteristic polynomial is 
pe) = 2 - Le + ate —-+--+(-—1)”. 
As is known from the binomial theorem, 


p(z) = (5 — 1). 


It follows that z = 1 is a zero of multiplicity N. Corollary 6.8 yields the 
solutions 1, ἢ, 1?,...,%~1, and the general solution can be written in the 
form 

Xn = Co + Cyn + Con? +-++ 4+ Cyn". 


Problems 


19. Let N > Ο be an integer. Find a general solution of the difference 
equation 


An ΠΕ Xn-1 ΞΕ Xn-2 Ἐπ εΞ Aa -Ξ ῦ. 


linear difference equations 141 


Deduce that for any real angle « and any integer m, 1 Ξ mS N, if 
gp = 2n7m/(N + 1), 


cosa + cos(« + φ) +---+ cos(« + No) = 0, 
sing + sin(® + p) +---+ sin(a + Ne) = 0. 
20. Construct a difference equation that has the solutions 
(a) x, = 1, Xn = 2", Xn = 3"; 
(b) ea ἘΞ ἢ, a= ae: oS ἐτν 


Also find a difference equation that has both the solutions given under 
(a) and (b). 

21*. LetO Ξ g, < 27,k = 1,2,...,N, Qe τὸ φι fork # I, and let the numbers 
ακ be arbitrary. Show that the N sequences 


XO) = (ρίας tn}, i We eng Ν 


are linearly independent. 
6.9 The Backward Difference Operator 


_ An important special difference operator of the kind defined by (6-18) 

is the backward difference operator, traditionally denoted by V (read 

“nabla”’ from the arabic word for harp) and defined by 

(6-25) (VX)_ = Xn — Xn-1- 

‘This difference operator is of order 1. It is closely related to the forward 

difference operator 4 which was introduced in 84.4, In fact, we have 
(AX), = (VX)nas- 

_ Thus every relation valid for the forward difference operator can also be 
expressed in terms of the backward operator, and conversely. In the 
framework of our present notations it is a little more convenient to work 
with the backward operator. 

Integral powers of the operator V are defined inductively by the relation 
(6-26) Vex = V(VF-1X), k= 2,3,.... 
EX AMPLE 
15, V?X = V(VX), hence 
(V?X)n = (VX)n — (VX)n-1 


4 SS hg oo Ng τα (Spo ati ae 


= Αἰ, — 2Xq-y + Xq-2. 


The example suggests an explicit expression for (V"X), in terms of 
binomial coefficients. In fact, 


27) (V*X), = ας — fee: + 6. ie “+ (=1)*() 0 


142 elements of numerical analysis 


The proof of (6-27) is by induction with respect tok. By definition, the 
formula is true fork = 1. Assuming its truth for k = m, we have 


(V"*tX), = (V"X)n — (V"X)n-1 


dee anemeoe 


ἀρ (Gon tt COG 
+ (-1)"%y-m-1| 


By virtue of the identity (3-13) this expression simplifies to 


| me Ty πὰ τε 1 
Xn π ¥ ᾿ Jno ἊΣ ( 9) ὩΣ ποτ τ τ ἤμν (—1}) afi dane’ 
which is equal to the term on the right of (6-27) when k is replaced by 
m +1, Thus, (6-27) must be true for all positive integers x. 
In practice, the operator V is mainly used in connection with sequences 
F defined by 


San = F%n)s 


where x, = nh, h being a positive constant. One ordinarily writes V/;, 
in place of the logically more consistent, but cumbersome notation 
(VF),. The values of successive powers of V are conveniently arranged in 
a two-dimensional array, as in table 6.9a. 


Table 6.9a 
tn 
κει 
κει V2fn+2 
Vii+2 V3fn48 
ἤν ν "ἢν 68 
Vin+3 
ἔπεα 


Table 6.9a is called the difference table of the function ͵, constructed 
with the step #. Each entry in the table is the difference of the two entries 
immediately to the left of it. 


linear difference equations 143 


EXAMPLE 

16. From Comrie’s table of the exponential function /(x) = e* (Comrie 
[1961]), which has the step A = 10~°, we can construct the following 
difference table (beginning at x, = 1.35): 


Table 6.9b 
Χῃ ἘΣ Vin νὰ 

1.350 3.857420 ; 
0.003859 

1.351 3.861285 0.000004 
0,003863 

1.352 3.865148 0.000004 
0.003867 


1.353 3.869015 


The study of properties and applications of the difference table, in 
particular the relation between differences and derivatives of a function, 
will be one of our chief concerns in part II. Here we are content with 
stating the following fundamental fact. 


Theorem 6.9 Let the function f be defined on the whole real line, 
let k be a positive integer, and leth > 0. If /, = f(A), a necessary 
and sufficient condition in order that 

(6-28) yee |) 

for all integers n is that f(mh) = P(nh) for all n, where P is a poly- 
nomial of degree not exceeding Καὶ — 1. 


Briefly, the theorem states that the Ath differences of a function are 


| identically zero if and only if, at the points where the differences are taken, 
the function agrees with a polynomial of degree <k. 


Proof. We have to show: (a) If the Ath differences are zero, then the 
Values of fat x = nh are those of a polynomial of degree <k; (Ὁ) If the 


values of f agree with those of a polynomial P of degree << at all points 
x = nh, then the kth differences are zero. 

To prove (a), we note that (6-28) means that the sequence F = {/,} 15 a 
Solution of the difference equation 


(6-29) V'F = 0. 


By (6-27), the characteristic polynomial of this equation is 


reyes Corrs Qe — anf 
(z — 1)". 


Ι! 


ES ———— 


144 elements of numerical analysis 

This polynomial has a single zero, of multiplicity k, at z= 1. By 

corollary 6.8, any solution of (6-29) can thus be represented in the form 
Jn = Cy + Con + ΩΣ +++ + &n**, 

Clearly, f, = P(nh), where P is the polynomial defined by 


Co 


| ς c - 
Pi) = δὶ nh Bele PE «+ bas ἃ 1. 


To prove (b), let f(x) = P(x) for x = nh, where P is a polynomial of 
degree < k, 


P(x) = ἄυχῖὴ + ayx®o? feet a3. 
We then have 
ta = P(nh) = ao(nh)*~+ + a,(nh)¥-? +--+ + ayy. 


This relation shows that the sequence F = {f,} is a linear combination of 
the sequences X‘” defined by 


AEM ia 9 m=0,1,2,...,4 - 1. 


These sequences are solutions of the difference equation V"X = 0, by 
corollary 6.8. It follows that the sequence F is a solution of the same 
difference equation, that is, (6-28) holds. 

EXAMPLE 

17. Let P(x) = x* — 2x +1. The difference table with step A = | 
(begun at x = 0) is shown in table 6.9c. 


Table 6.9c 
Ι 
τ - 6 
ς δὲ (ae wrod 
22 εἰ δ Die ον 
57 a ee 
116 ie 30 : 0 
205 36 
gente.) 


Problems 


22. If T = T(t) denotes the sequence of Chebyshev polynomials introduced in 
example 3, calculate 


(b) (V°TO)n+2- 


(a) (ἡ Τί) «1, 


linear difference equations 145 


23. Give an explicit formula for the differences of the exponential function, 
f(x) = e*, and show that these differences are never zero. Explain the 
apparent contradiction with the numerical values given in table 6.9b. 

24. The first differences given in table 6.9b are very nearly equal to 10 “3 times 
the average of the adjacent function values. Is this an accident? 

25. Formula (6-27) expresses (V*X), in terms of Xn, Xn-1,..-;Xn—-x. Show 
conversely how to express x,_; in terms of (V°X),,(VX)n,..., (V*¥X)p. 


Recommended Reading 


A more thorough treatment of difference equations will be found in 
Milne-Thomson [1933]. Goldberg [1958] gives an enjoyable elementary 
account with many interesting applications. 


Research Problem 


Try to generalize as many results of this chapter as possible to linear 
difference equations with variable coefficients. Do not attempt to find 
solutions in explicit form. 


chapter | Bernoulli’s method 


With the present chapter we return to the problem of solving nonlinear, 
and in particular polynomial, equations. The methods discussed in the 
chapters devoted to iteration are very effective, but only if a reasonable 
first approximation to the desired solution is known. How to obtain such 
a first approximation is a problem which, for equations without special 
properties, is of such generality that it cannot be solved by generally 
applicable rules or algorithms. For polynomials, however, there do exist 
algorithms that furnish the desired first approximation using no other 
information than the coefficients of the polynomial. Two such algorithms 
—a classical one due to D. Bernoulli and one of its modern extensions due 
to Rutishauser—form the subject of this and the next chapter. Bernoulli's 
method, in particular, is one which yields all dominant zeros of a poly- 
nomial. By a dominant zero we mean a zero whose modulus is not 
exceeded by the modulus of any other zero. 


7.1 Single Dominant Zero 


In chapter 6 we have seen how the general linear difference equation 
with constant coefficients can be solved analytically by determining the 
zeros of the associated characteristic polynomial. Bernoulli’s method 
consists in reversing this procedure. The polynomial whose zeros are 
sought is considered the characteristic polynomial of some difference 
equation, and this associated difference equation is solved numerically 
by solving the recurrence relation implied by it. From this solution it is 
easy to extract information about the zeros of the polynomial, as we 
shall see. 

To begin with the simplest case, let us assume that the polynomial of 
degree N, 


(7-1) ple) = doe + ae" * Ὁ ὧν, 
146 


Bernoulli’s method 147 
whose coefficients may be complex, has N distinct zeros 2Z,, Za,..., Zy- 
What happens if we solve the difference equation 
(7-2) AoXn + αιἰχ,.α $+ + AyX,-n = 0 
which has (7-1) as its characteristic polynomial? According to $6.7, the 
solution X = {x,} (whatever its starting values are) must be representable 
in the form 
(7-3) Xn = ΟΖῚ + CozQ + +++ + CnZh, 
where ¢,, Cs,..., Cy are suitable constants. To proceed further we make 
two assumptions: 

(i) The polynomial p has a single dominant zero, 1.e., one of the zeros— 
we may call it z,—has a larger modulus than all the others: 
(7-4) eel oe lead, ge nag me | 

(ii) The starting values are such that the dominant zero is represented 
in the solution (7-3), i.e., we have 
(7-5) | οι τε 0. 

We now consider the ratio of two consecutive values of the solution 
sequence XY. Using (7-3) we find 

ΝΕ ah es ee 
ie cts AR ate ἄν 2 es 


By virtue of (7-5) this may be written 


n+1 , 11} 
(7-6) ἘῈΣ ἀρ κα τ θτοςοο.οο:.-.--.-----. 
Xn 1 + 2 (2) +e (2) 
Ci (23) Cy 23/ 
By (7-4), |z,/z,| < 1 for k = 2,3,...,.N. It follows that 
(7-7) (2) -,κῇ as A> Ὁ 
21 


for k = 2,3,....Ν. The fraction multiplying z, thus tends to | as 
n-> oo, and we find 


lim 


We thus have the following tentative formulation of Bernoulli’s method: 


Algorithm 7.1 Choose arbitrary values Xo,X-1,...,X-w+1, and 
determine the sequence {x,,} from the recurrence relation 
GXnoa + Gato bh? be tinXa-w 
--- ----- τ ͵ - ΛΓὝῷ ᾿-.ῦἉς-ς-ς--.-------ςςς-.-ςς--------------οα-Ξ----Ξ--νἧοΓἠἌ -Ὸΐἢ 
ag 


x, = ΣῈ Ἐπ Pee τ τς 


148 elements of numerical analysis Bernoulli’s method 149 


Then form the sequence of quotients Table 7.1 | 
Re AU SEIT pal  ΠΎΥ ihe iinet acl. <1 2 bitmaps 
Tapas ᾿ θῇ : ha Ἢ (4 
(7-8) qn = — . ᾿ | q A"qn AG, dn 
: ἜΝ 
We have proved: , ‘2 
rete 1G ε ist mie At 1s a 2  2.7142857 : 
Theorem 7.1 Ifthe polynomial p given by (7-1) has a single dominant 2. 3.142857] 1.1578947 00910116 —0,1982597 0.9596350 
zero, and if the starting values are such that (7-5) holds, then the 4  3.3530611 1-9668831)__ 9) y4g9395 (00417791 —0.1214233 0.9454595 
quotients g, are ultimately defined and converge to the dominant 5 3,4122447 ρα τους _09.0292706 9:0199619  —0.0792746 0.9383760 
seen ak 6 3.3725945 9-9883800 pi e4e3 0.0108076 + —0.0536856 0.934644 
EXAMPLES 8 3.1331066 0-03/80°° — —0.0081653 pans 
: live m : 19 9 212.5753180  “" —0.0056093 “ἢ 
1, Applying Bernoulli’s method to the polynomial 10 28087866 99440290 Ones 0,0017041 
(2) Ξ-- Ζ', -Σ --ὶ 11 2.6406071 9.94012238ὁ _ 9 9997449 Ο0011603 
P 
τ 12 2.4752493 0.9313718. _9 ggig495 (90008024 
with the starting values x)» = 1, χ. = 0 yields the Fibonacci sequence 13 2.3154382 9:9354364 __ agi 3934 9.000561! 


considered in example 7 of chapter 6. Conditions (i) and (ii) are clearly 
satisfied here. The ratios of consecutive elements thus converge to the 
dominant zero z, = (1 + V5)/2. 
2, Let 

p(z) = 70z* — 140χ3 + 90z? — 20z + 1. 
The difference equation (7-2) (solved for x,) here takes the form 


ΤΠ 140x,-1 — 90χ,... + 20X,-3 — Xn-4 

i ΣΤ 70 
The first three columns of table 7.1 give the values of η, of the 
sequence x, (started with x, = 1, x, = 0, ἢ < 0) and of the sequence 
{g,}. (The other columns will be explained in §7.2.) 

The preceding is a mere outline of Bernoulli’s method in the simplest 
possible case. A number of complications have yet to be dealt with, such 
as the following: 

1. Slow convergence. 

2. Zeros of multiplicity > 1. 

3. Unfortunate choice of initial values. 

4, Several dominant zeros. 

5. Calculation of nondominant zeros. 


These questions will be dealt with in the subsequent sections. 


Problems 


1. Find, by Bernoulli’s method, the dominant zero of the polynomial 
p(z) = 3225 — 48z7 + 182 — 1 
to three significant digits. 


14 2.1627466 9-9340550  __poop9gsg 0.003956 —0.0034633 0.9305917 
16 1.8815034 9.232360. __9 ggg5954 90001998 


ee το. 9:9. 


0.93057 0.93057 


2. Use the yalue obtained in problem 1 to obtain the dominant zero to six 
significant digits, using either Newton’s or Steffensen’s method. 

3. Look up the closed formula for finding the zeros of a cubic polynomial and 
use it to determine the dominant zero of the polynomial of problem 1. 

4, What is the meaning of the phrase “‘the quotients are ultimately defined”’ 
in theorem 7.1? Give an example of a polynomial violating condition (i) 
where infinitely many of the quotients g,, are undefined. 


7.2 Accelerating Convergence 


_ Even if the conditions of convergence of Bernoulli’s method are satisfied, 
the speed of convergence may be slow. By this we mean that the error of 
the approximation x,,,,/x, to the zero δ΄, 


Pf 
d, = ft — 2, 
i 


tends to zero only slowly. As in the case of iteration, it may be possible to 


Speed up convergence by making judicious use of information about the 
manner in which d, approaches zero. In order to discover this manner 
Of convergence, we shall analyze the errors d, more closely. We shall 


We 

f | 

Bernoulli’s method 151 | 
continue to make the assumptions (i) and (i/) of §7.1. In addition, we The errors d, thus satisfy precisely the condition (4-12) of theorem 4.5 
which makes Aitken’s 4? process effective. We thus have as an immediate 


150 elements of numerical analysis 
assume that 
| 


᾿ (7-9) [πεἰὶ > [5 > lzel, k= 3,4,....,N consequence of theorem 4.5: 
Theorem 7.2 Under the hypotheses stated at the beginning of the 
present section, the sequence {q,} derived from the sequence {qg,} by 
" (7-10) Co #0 means of Aitken’s 4? formula, 
in (7-3) (the next-to-dominant zero is represented in the solution {x,}). | ps (Aq,)* 


] i Gn — i " 
Under these hypotheses, the error | ἧς 
converges faster to the dominant zero z, than the sequence {q,} in the 


ΟΣ 4 cozhtt 4-4 eyzyt? — Ζι(Ο.ΖῚ +++ + ΟΝΖΝ 
C121 + ozs 4g + νῶν (Cy LEN N ND sense that 


CyZ" + Cozh +++ + + CyZh | 1 Z 

| Bey eel), | 
| C(Z2 — 21)22 bot Cy(Zy — 21)2 | πτε Yn — 24 ! 
C12) + CoZg +++ + CyZy EXAMPLE 

| can be written in the form 3. In table i the values q, are shown in the fast column. Also given 

| | | are the intermediate values required to calculate φῇ. The faster conver- 


| (the next-to-dominant zero is the only zero of its modulus), and that 


| = 
£& 


7.3 Zeros of Higher Multiplicity 


I (7-11) d,, = At"(1 + 80. gence of the sequence {g,} to the exact zero z; = 0.93057 is evident. 
i where 
ἢ | eo 1 Ξ z1) ΠΕΡῚ Problems 
| δ᾽ Zy | 5. Find, by Bernoulli's method speeded up by the A? process, the dominant 
| | ἐπα zero of the polynomial 
| | | ee €3(Z3 — 21) (22) Πὰν (2) | plz) = Σ΄ — 7z? + 1327 — 8z + 12. 
I! eee C2(Za — 21) \Ze ΕΣ = ia | 6. Apply the A? method to the sequence {g,} obtained in problem 1. 
| | R ΤῊ: (2) Paihia (=) | | ᾿ς 7. Explain why the 4? method does not speed up appreciably the Bernoulli 
| { | δι \Z: Cy \2) | sequence for the polynomial | 
Ι ' 2 gd : | ae ΝΣ 2. - 
i By virtue of condition (7-9), we have, in addition to (7-7), | p(z) = ζῇ — 423 + 62 -- 4. 
| (z = 2 is the sole dominant zero.) 
| (3) -» ἢ as n> οὐ | 8. If suitable conditions are satisfied, the error of the sequence {g,,} generated 
Wi Zo by Bernoulli’s method tends to zero like (ze/z,)". How does the error 
| | | ae ὟΝ Phooey es of the accelerated sequence {g/,} tend to zero? What do you conclude 
| for k = 3, 4,..., Ν, and the ratio on the right has the limit | as ἢ — ©. about the number of steps necessary to achieve a given accuracy 
| It follows that | . (a) if |z,;| ~ |zo| > [28]. 
" (b) if j21| > [28] ~ {29} 
| (7-12) lim ε,, = 0. laa] > Jal ~ les! 
| 
| 
| 


As a consequence, | . | | 

| If the polynomial p has repeated nondominant zeros Zo,..., Zy, then 

] ἄχει... δον 7 ee (1 + 8,), formula (7-3) for the general solution of the difference equation (7-2) also 
| 
| 


] 
Ι d,, 1 + δὶ contains (by corollary 6.8) terms like n*z3. Thus the expression (7-6) 
cohare _ contains terms like 


| a 
᾿ ΞΕ ῖ — ot ἊΣ tl ἃ ἡ - i 
| | ὃ. = ~-—_*+0 as n>. n'(2) in addition to (2) 


1 1 


152 elements of numerical analysis 


However, these terms cannot disturb the convergence of the method, since 
if |g| < 1, then we have not only gq” 0 as n-> © but also n*q" — 0 for 
any fixed value of k. 

The situation is different if the dominant zero has multiplicity > 1. 
(We still assume that there is only one dominant zero.) To fix ideas, let 
the multiplicity be 2. Relation (7-3) then takes the form 


Ng = GREE Cott ἝΞ ΟΣΖΕ +s, 
where |z,| < |z,|.. Thus the ratio (7-6) now becomes (still assuming 
οι # 0) 


Xn+1 _ c(n + 1)z} td 4 C224 mth + Cgzgt* +> 
Ke Cynzt + CoZl + CgzZ3 +° 


. Pe ae: oe (2) δ ators 
Ly (1 + Lc, + “εἰ “ΕΣ + I)ey + Co \Z1 
ee ey Ae es ἜΗΝ (3) 


Oe ee, eh £1 


(7-13) 


Convergence still takes place, but, due to the factor 


(n + le, + Ce po Ϊ 
NC, + Co n+ CofC, 
at a much slower rate; the error after n steps is now of the order of I/n as 
against (z,/zo)" if the dominant zero has multiplicity one. An example for 
this slowdown of convergence, as well as a possible remedy, is given in the 


following section. 


Problems 


9, Devise an Aitken-like acceleration scheme for the case of a single dominant 
zero of multiplicity 2. [Sketch of a possible solution: From (7-13) we 


find for the error d, = qa — 21 
eit CZ, + O(t") 
Bn ὦ Ὁ OY 


where f = Z3/z,. Neglecting O(7") and setting C = Ce/c,, we have 
21 ’ 

Ar oe 
The unknowns c and, if desired, ἡ can be eliminated from consecutive 
relations (7-14) as in the derivation of Aitken’s formula, yielding a 
formula for z;.] . 

10. Apply the procedure devised in problem 9 to the calculation of the 
dominant zero of the polynomial 

pz) το σ΄ — ἀξ — 22° + ie Ὁ 9. 


(7-14) da -- 21 = 


Bernoulli’s method 153 


7.4 Choice of Starting Values 


One of the conditions for convergence of algorithm 7.1 was that c, # 0. 
It can be shown by means of complex variable theory that this condition is 
always satisfied if the starting values are chosen as follows: 

(7-15) Ao = yg Se ey = 0, ag = Ι. 

A different, more sophisticated choice of starting values is defined by 

the following algorithm: 


Algorithm 7.4 If the coefficients do, a,,..., αν are given, calculate 
Nos Xi, +++» ἄν... by the formulas 


ad 
Ag ΞΞεᾧ =a. 
ay 
l | 
Χὶ = —— (26. + Xp), 
ag 
Fa we | 
Xg = ae (3a3 + du Xy + Q,X4), 
0 


and generally 


] | | 
(7-16) x, = Te [A + Vaeea A AX + Oy χὰ tee + ἄὐχε- χ], 
ee ee ee 


The starting values generated by algorithm 7.4 have the following very 
desirable property: 


Theorem 7.4 Let the polynomial p(z) = aoz" + a,z¥~+ +-+-+ ay 
have the distinct zeros Z,, Zo,...,Zy (Μ Ξ N), and let the multi- 
plicity of Ζ, be m, (= 1,2,..., Μ)ὴ. If the starting values for 
Bernoulli’s method are determined by algorithm 7.4, then relation 
(7-3) takes the form 


(7-17) x, = myzttt + Μη} +e ++ + my zt, ἢ Ξε. 1.....Ψ 


The proof is again most easily accomplished by complex variable theory, 
and is omitted here. Relation (7-17) is remarkable for the fact that no 
Powers of ἡ appear, notwithstanding the possible presence of zeros of 
Multiplicity higher than 1. The difficulty mentioned in §7.3 thus can 
always be avoided by a proper choice of the starting values. The ratio 
X,41/X, then converges at a rate determined only by the magnitude of the 
_ two largest zeros. 


154 elements of numerical analysis 
EXAMPLE 
4, We compare the sequences {q,} for the polynomial 
P(z) = (z — 3)°%(z + 1)? = z* — 42° — 22? + 122 + 9, 


generated by starting Bernoulli’s method by (7-15) and by algorithm 7.4. 
The recurrence relation is in both cases aa 


An = 4Xn-1 ἘΝ 2Xq ἃ > 12, = οὐχ τ id 


Table 7.4 Method (7.15) Algorithm 7.4 


it Xn Gn . Xn dn | 


ἘΕΓῚ 0 
—? 0 ἵ 
—{ 0 7 
0 1 4 
4 } 5 
3.78 3.15385 j 
3 68 164 ia 
3.69 2.95122 } 
4 251 484 
' 3.54 3.01653 
5 888 1 460 . 
ΤΟΣ 3.46 ἐς 2.99452 
6 3 076 4 372 
3.40 3.00183 
7 10 456 13 124 
3.35 2.99939 
ὃ 35 06] 39 364 is 
3.32 3.00020 
9 116 252 118 100 | 
3.29 ; 2.99993 
10 381 974 aes, 354 292 
| 3.26086 3.00002 
11 1 245 564 1 062 884 
3.24000 2.99999 
12 4 035 631 3 188 644 | 
3.22222 3,.0000025 
13 13 003 696 3.20690 9 565 940 4.999999 
14 41 701] 512 : 28 697 812 ‘ 
3.00000 3,0000000 
Problems 


11. Use algorithm 7.4 to obtain starting values for the application of Ber- 
noulli’s method to the polynomial of problem 10. Verify relation (7-17). 

12. Using Vieta’s formulas, verify (7-17) for ἢ = Ὁ and ἡ = 1 in the case of an 
arbitrary polynomial. 

13. Show that if the rational function 


Age) watts a Naa 
if Qo + ayz + doz 0... .Ὁ ayz™ 


is expanded in powers of z, then the coefficients of 1, z,..., 2" δι are 
identical with the numbers Xo, X1,...,; Xv-1 defined by algorithm 7.4. 


Bernoulli’s method 155 
7.5 Two Conjugate Complex Dominant Zeros 


The theory developed so far holds equally well whether the coefficients 
of the polynomial p are real or complex. However, we have always 
assumed that z, is the sole dominant zero of p. Let us now consider the 
case where p is a polynomial with real coefficients which has a pair of 
complex conjugate dominant zeros z, and z, = #,, both of multiplicity 
one. The remaining zeros shall satisfy 


(7-18) EAT a [Ζε|; ἐπε δε. Se: Smee 5 


ε 


for the purpose of our analysis we assume that the nondominant Zeros, 
too, have multiplicity one, although this is not essential for the result. 

If the starting values for the sequence {x,} are real, equation (7-3) takes 
the form 


Χῃ = €1Z7 + Οι Σὲ + C3Z5 fee CyZy. 


Representing the complex numbers c, and z, in polar form, we write 
Zz, = re’, c, = ae”, wherer > Oanda>0O. We may assume, further- 
more, that z, is the zero in the upper half-plane, and consequently 
0 -φ-π. The expression for x, now becomes | 


Xn = 2ar" cos (np + 8) + ¢3z§ “τ. Ὁ cyzh, 
This may be written 
(7-19) X, = 2ar"[cos (np + δ) + 4], 


where 


Figure 7.5 


156 elements of numerical analysis 
and hence, for some suitable constant C, 
eater, 


where 1 denotes the largest of the ratios |z,/r|, k = 3,..., N. In view of 
(7-18), the number ¢ is less than 1, and hence 


(7-20) lim foi. 

Our problem is to recover the quantities r and p from the sequence 
{x,}. To find its solution, let us begin by assuming that #,=0. We 
recall that the sequence {x,} with the elements 


X, = Zar" cos (np + ὃ) 
is a solution of the difference equation 
(7-21) Xn + AXn-1 + Bxa-2 = 9, 


where B = r?, A = —2rcos φ (see problem 3, chapter 6). To determine 
the coefficients A and B from the known solution {x,}, we observe that 
equation (7-21) together with the corresponding equation with ἢ increased 
by 1, | 

Xn-1 t+ AX, + Bxn-1 = 9 


a 


oo a ae ere 


Bernoulli’s method 157 


The relations (7-24) solve our problem if {x,} is a solution of the 
difference equation (7-2) such that 6, = Oin (7-19). If {x,} is any solution 
of (7-2), @, # Oin general; however, using (7-20) it is not too hard to show 
that 
Das 3 En 
(7-25) ae ἢ, xD, 
as n-> ©. We thus have obtained the convergence of the following 
algorithm for the determination of a pair of complex conjugate dominant 
zeros. 

Algorithm 7.5 From the solution {x,} of the difference equation 
(7-2), calculate the determinants D,, and E, defined by (7-22) and 
(7-23). With these, form the sequence of ratios D,,,/D, and 
F/2D,. 

The corresponding convergence theorem is as follows: 


Theorem 7.5 Let the polynomial (7-1) have exactly two dominant 
ΖΟΙΟΒ Z;,2 = re*'®, both of multiplicity one, whereO < ῳ < 7. Ifthe 
solution of the difference equation (7-2) is such that c, 4 0 in (7-3), 
then the limit relations (7-25) hold. 


EXAMPLE 


5. For the polynomial 
p(z) = 8lz* — 108z* + 242 + 20 


(exact dominant zeros: z,,. = 1 + i+) the computation proceeds as in 
Xn ἜΝ table 7.5. It is evidently unnecessary to calculate the determinants D 
it . ee ae _ and E,, from the beginning. 

Ϊ equals, by a trigonometric identity, Algorithm 7.5 has the disadvantage of not being very accurate when φ, 
the argument of the dominant zero, is small. Let us assume that we have 
determined r accurately and x = r cos φ = lim E,/2D, with an error dx. 
We then have to calculate » from the relation 


The determinant 


| 
| 
Ἷ | represents a system of two linear equations for the two unknowns A and B. 
| 
| Xn-1 *n-2 
| 


(7-22) p= 


4a?r2"-2cos? [(n — [)φ + 8] — cos (np + δ) cos [(n — 2)φ + 8} 


| 
| 
| | | = 497p2"~2 sin? oy, 
Ϊ 
] 


and hence is different from zero, as 0 < » < 7. We may thus solve for 


For small values of φ this may be quite large even for a small value of dx. 


Wil A and B, finding cos p = - 
| Π | ΕΒ _ Fn ae Dra and find for the differential of φ 
HN) με... ρι dp = a dx. 
| where | φ 
| Xn Xn -2 
| 


(7-23) t= 


ἢ Ϊ 
β The desired quantities r and φ can now be found by 14. Determine the dominant zeros of the polynomial 


A ee eee p(z) = σῇ -- z7+2 
av Pe ie to three decimal places. 


PR cm , 2h 
(7-24) r=VB= [Pe coop = -ᾶ- 


158 elements of numerical analysis | Bernoulli’s method 159 


| i Table 7.5 ᾿ we can hope to be able to make a statement about the angle g. Clearly, 
| | | Xi te ῬΏΜΗΝ, E, E,/2Dn when ¢ is small, the “sign waves” are long, and when ¢ is close to π, they 
| | τ | ¥ - : - 2 are short. (In the extreme case, where ᾧ = π, plus and minus signs 
| i | 1.333333 alternate in strict order.) 
| 1.777778 q In order to establish a more precise result, let us assume that the 
Hh 2.074074 Ι polynomial p has degree 2, Equation (7-26) then is exact; we now write 
|| 2.123457 it in the form 
' 
i] ἘΠ C2) sn tan on 844) 
My 0.965707 |} | ropes 
Vi 0.178022 | If ἢ were a continuous variable, the signs of x, would change from minus 
0.718589 to plus whenever 
" --1.634438 π | 
| | | anal Ι no + 8+ 5 = 2Κπ, ἢ 
i — 3.124966 | ἘΣ: tows 
| | 3.504914 1.229324 2.458740 where k Is any integer. Let now, for any positive integer 4, m, be the first 
Ι πα τος τανε  ἀξερραι.- ἘΠ, ἀσοσπααν τ τ integer following the kth change of sign from minus to plus of the sine- 
ia 3180991 1.517807 THTI25 gs ogssgs = On Ι function in (7-27), 1.¢., let 
| 1.111095 0.999990 | 
| — 2.431232 1.686428 1111115 3.372868 1.000003 mn 
| ἡ — 1.328033 1.873816 1.111111 3.747630 0.999999 2Κπ — ὃ — 5 
Ϊ 0.045304 2.0820᾽18 {1113 4164035 gag 00 | (7-28) jp ae abe 
| | | 1.566200 2.313358 4.626705 ) φ 
| | ie ὙΠῸ where 0 Ξ @,, < 1. This means that for each positive integer k, 
"ἢ 1.111111 1,000000 , 
4 sign X;,-1 = —l, sign x,, = Ὁ or Il. 
a 15. Starting from the exact representation (7-19) of x,, prove the relations Now subtract from (7-28) the corresponding relation for k = 0. We 
iM (7-25). find, after dividing by k, 
| 16. Devise a modification of Bernoulli’s method for the case that the poly- 
| nomial p has two real dominant zeros, both of multiplicity one, of opposite “=> = 2a ss = 
3 Ψ 


| signs, and apply your algorithm to the polynomial 
p(z) = z* — 229 — 227 + 82 — 8. We now consider the limit of this expression ask 00. Since the 0, are 
numbers between zero and one, 


| 7.6 Sign Wavesy{ | lim Pn Yo _ 0 
Ι k Ψ 5 


+ This section may be omitted at first reading. The limit T appearing here has a very simple interpretation. The 


᾿ How can we detect a pair of conjugate complex zeros? Formula ΠῚ tiows that iia 
Ny (7-19) shows that in the presence of a pair of conjugate complex dominant ἀπ sags 
i) zeros the elements of the sequence {x,} behave like | T= iim | 
| C= Xn = 2ar® cos (np + ὃ). exists and has the value 27/. Solving for φ, we find 
| | This equation implies, in particular, that the signs of the x, oscillate. | 5 
Ι Moreover, by looking at the frequency with which these oscillations occur, (7-29) φ = = 
| 


160 elements of numerical analysis 


integer m, — m,—, represents the length or period of the Ath sign wave in 
the sequence of signs of the x,. By virtue of the identity 


ἥ — fig = (7, — Mei) + (qe -1 — ἤν} ἘΠ Ὁ (1, — ἢ): 


the expression ἥν — Mo is the sum of the periods of all sign waves from the 
first to the kth. Hence Τ is just the average period of all sign waves. 

We thus have obtained the following algorithm for the determination of 
the arguments of a pair of complex conjugate dominant zeros. 


Algorithm 7.6 In the sequence {x,} generated by algorithm 7.1, 
delete everything but the signs} of the x,. For k = 0, 1, 2,..., let 
n, be the index of the kth + which is preceded by a —, and calculate 
the average value of all sign wave periods 1, — Ng-1- 
Although our analysis applies only to polynomials of degree 2, the 
following result can be proved for polynomials of arbitrary degree 
(Brown [1962]). 


Theorem 7.6 Under the hypotheses of theorem 7.5, the argument ¢ is 
given by (7-29), where T denotes the average sign wave period. 


In electrical engineering, the quantity 27/T is called the circular frequency 
of an alternating current with period 7. In this sense we can say that ¢ Is 
equal to the circular frequency of the sequence of signs of the x,. 
EXAMPLE 
6. For the polynomial 


p(z) = σ' — 813 + 392? — 62z + 50 
(:3 — 2z + 2)(z? — 6z + 25) 


the sequence of signs of the x, turned out as follows: 


diborane re rd ae eS ee ea tho ἩΠΕ 
: Π ἘΒΕΕ ΠΝ αν ἘΦ " | 
6 
SG ἐῶ ὁ ear ee μας δ ἢ ga 
= - - ἘΞ .......-....--ἕ Ἕν τσὶ Ἂ 


The average length of the periods of the full sign waves recorded above is 
6.75. We thus find the approximate value 
2π 


φτξης 553. 


The exact value is 


o = aretg $ ~ 53,1’. 
+ A zero may be considered positive or negative. 


Bernoulli’s method 161 


It is only fair to point out that, in spite of the favorable showing in 
example 6, the convergence of algorithm 7.6 is rather slow in general. 
The algorithm is not useful for purposes of exact numerical computation, 
but it does give very quickly, with almost no computation, an idea about 
the location of a pair of complex conjugate zeros. 


Problems 


17. Bernoulli’s method is applied unsuspectedly to the polynomial 
p(z) = 2* + 225 + 42? — 22 — 5. 


Show that the polynomial has a pair of complex conjugate dominant 
zeros. In which quadrants of the plane do they lie? 
18. Algorithm 7.6 fails for the polynomial 


Ρ(Ζ) = χῇ — 227 + 2z — 1. 
Explain! [Hint: How many dominant zeros are there ?] 
19, Devise an algorithm for removing a quadratic factor 23 — wz — v froma 
given polynomial. [Hint: Write p(z) = (2? — uz — v)g(z) and compare 
coefficients of like powers of z, as in §4.10.] 


20. Verily that z* — 3.711245z + 3.728699 is (approximately) a quadratic 
factor of the polynomial 


p(z) = ζῇ — 3z* — 20z° + 60z? — z — 78 


and remove it from p by the algorithm devised in problem 19. 


Recommended Reading 


Bernoulli’s method, and some extensions of it, are dealt with by Aitken 
[1926]; other accounts are given in Householder [1953] and Hildebrand 
[1956], pp. 458-462. An entirely different procedure for determining the 
dominant zeros of a polynomial is known as Graeffe’s method. It is 
dealt with in the above standard numerical analysis texts; see also 
Ostrowski [1940] and Bareiss [1960]. A complete account of modern 
automatic procedures for polynomials is given by Wilkinson [1959]. 


Research Problems 


1. How does the presence of more than two dominant zeros manifest 
itself in the sequence of signs of the x,, and how can the arguments be 
recovered ? 

2. Suppose a polynomial has exactly three dominant zeros, one real, two 
complex conjugate. Devise a modification of Bernoulli’s method that 
deals with this situation. 


chapter ὃ the quotient-difference algorithm 


Bernoulli’s method has the disadvantage of furnishing only the dominant 
zeros of a polynomial. If it is desired to compute a non-dominant zero by 
Bernoulli’s method, it is necessary first to compute all larger zeros, and 
then to remove them from the polynomial by the method of §4.10. Only 
rarely these zeros will be known exactly. Thus the successive defiations 
will have a tendency to falsify the remaining zeros. We now shall discuss 
a modern extension of Bernoulli’s method, due to Rutishauser, which has 
the advantage of providing simultaneous approximations to all zeros. 
Since the prerequisites for this volume do not include complex function 
theory, we are unable to provide the proofs for the convergence theorems 
in this chapter. But even though its theoretical background cannot be 
fully exposed, we feel that Rutishauser’s algorithm is of sufficient interest 
to warrant its presentation at this point. 


8.1 The Quotient-Difference Scheme 


The Quotient-Difference (OQ D) algorithm can be looked at as a general- 
ization of Bernoulli’s method. As in chapter 7, we are given a polynomial 


(8-1) P(Z) = aoz™ + ayzX~* +-+++ ay 

and form a solution of the associated difference equation 
(8-2) ApXn + ἄιχρῖ Het + dyXq-n = Ὁ. 

The sequence {x,} may for instance be started by setting 
(8-3) Xone, = Xoweg S07 SH X21 = YO; Xo = I. 


In chapter 7 we now formed the quotients 


oh) ἄπει (1). 
(8-4) 4. = “BH = αἰ 


162 


the quotient-difference algorithm 163 


If the polynomial p has a single dominant zero, then, as was shown in 
chapter 7, the sequence {g,} converges to it. 

The elements of the sequence {q,} will now be denoted by g; they form 
the first column of the two-dimensional scheme called the Quotient- 
Difference (QD) scheme. The elements of the remaining columns are 
conventionally denoted by ef)’, gi, εἰ q@,..., ΕἰΝ ταὶ g& and are gener- 
ated by alternately forming differences and quotients, as follows: 


(8-5a) Cn” = (43. — Gn) + Onan’ 
(8-5b) qiktD aa ent tk) 


ett) {{π| 15 


where k = 1,2,...,.N—1, n=0,1,2,.... In (8-5a) we set & =0 
when k = 1. The number of g columns formed is equal to the degree of 
the given polynomial. 


EXAMPLE 
1. For N = 4, the general OD scheme looks as follows: 
qs” 
0 ey 
τ 
ἜΝ qs” 
0 εἰ 1) Ee) 
(1 f 
45 τ" ΓΝ 
0 ep ἐν} eS” 
(1) 2 
q3 4!" q\ qs” 
0) ey e a 
(1) = ) 
qi ΕΝ qx” τ" 
: εἴ) : eo εἴ) : 
Scheme 8.1 


In each column of the scheme the superscripts are constant, and in each 
diagonal the subscripts. The rules (8-5) can be memorized by observing 
that in each of the rhombus-like configurations shown in the scheme 
either the sums or the products of the SW and of the NE pair of elements 
are equal. Ifa rhombus is centered in a g column, sums are equal; if it is 
centered in an e column, products are equal. In view of this interpretation 
the formulas (8-5) are occasionally referred to as the rhombus rules. 


The QD scheme can be described in yet another way if we introduce, in 
addition to the forward difference operator already introduced in §4.4, 
the quotient operator Ο defined by 


164 elements of numerical analysis 


| 
The relations (8-5) then can be written more compactly thus: 
| (8-6) en = nei? + Agr’, 9 gu*? = gntiQen”. 
Here it must be understood that the operators 4 and Q act on the subscript. 
| 


EXAMPLE 

2. We conclude this section with the numerical QD scheme for the 
polynomial 

p(z) = ζῇ —z-—1. 


| The sequence {x,} is the Fibonacci sequence (see example 7, §6.3). 


Table 8.1 
| 


| | Xn ei) ΓΗ eb qs? εἰ3) 
| 1 0 
| 1.000000 
| 1 0 1.000000 
| 2.000000 — 1.000000 
Ϊ 2 0 — 0.500000 — 0.000001 
| 1.500000 — 0.500001 
3 0 0.166667 — 0.000001 
1.666667 — 0.666669 
| 5 0 — 0.066667 0.000002 
| 1.600000 — 0.600000 
8 0 : 0.025000 0.000025 
| 1.625000 — 0.624975 
“ἢ 13 0 0.009615 -- 0.000049 
i 1.615385 | — 0.615409 
"" 21 0 0.003663 —0.000171 
| 1.619048 — 0.619243 
\ 34 0 — 0.001401 
| 1.617647 
| | 55 0 
| | | ἢ 
HI 14+ V3 ? 


Ϊ 2 


| | Problems 


1. Generate the QD scheme for the polynomial p(z) = Σ΄ + 5x? + 9z + 5. 
1 | 2. Let x, = n!, n = 0,1,2,.... Determine the OD scheme corresponding 
| to the sequence {x,,} (a) numerically, (b) analytically. (The sequence {x,} 
does not arise as solution of a difference equation in this case.) 
| 3. Give analytical formulas for the entries of the QD scheme, if x, = 1 + α΄, 
Mil) where Ὁ < |g| < 1. Show that 
lim 4) = 1, lim εἴ = 0, lim de = 4q, 


ih? a τι τὸ co 


and that ef = 0 for all ἡ. 


the quotient-difference algorithm 165 | 
8.2 Existence of the QD Scheme 


Evidently the OD scheme fails to exist if a coefficient e withO - k < N 
becomes zero, and it is easy to construct examples for which this actually | 
occurs. Another, trivial, case of nonexistence of the scheme arises when a 
x, becomes accidentally zero. It appears to be difficult to state explicit 
necessary and sufficient conditions for the existence of the scheme in terms 
of the polynomial p. In terms of the sequence {x,}, a necessary and | 


sufficient condition is that the determinants 
ἰὼ πε π΄ ΑΒΕ - 

(8-7) Ho = | ἀν ae An+2 ‘“** Anek 
Nn+k-1 ἄμα 5 *Xn+2k-2 


should be different from zero for k = 1, 2,..., Nand form = 0, 1, 2,.... 
It is possible to state simple sufficient (but not necessary) conditions for this 
to be the case. Among them are the following: 

(i) The zeros 2,, Z2,...,Zy of p are positive, and the sequence {x,} Is 
started by algorithm 7.4. 

(ii) The zeros z,, Zo,..., Zy of p are simple (but not necessarily real) and 
have distinct absolute values: 


(8-8) |z;| > [2] >--+ > |zy| > O. 


In case (ii) we can assert only that ΗΠ + 0, and consequently e“ + 0, 
for all sufficiently large values of ἢ. 

There is a good deal of numerical evidence that the QD scheme exists in 
Many cases even if neither of the above sufficient conditions is satisfied, 
for instance if p is a polynomial with real coefficients having pairs of 
complex conjugate zeros. 


Problems 
4. Show that 


1) a ΕΓ} {1 
ἘΝ HY), ) He gq He 1 ΗΜ’ 
- — a n = —— ποτα Τα * 
APH: HHS: 


5. Show that the determinants H{ and H{ formed with the elements 
xX, = 1 + g", where Ὁ < |g| < 1, are always different from zero. 

6. Let x, = r®cos(mp + 8), where 0 τῳ <7. Show that the corre- 
sponding determinants H{*’ are always different from zero. What are 
the conditions on @ and ὃ in order that Hi” + 0 for all n? 


166 elements of numerical analysis 
8.3 Convergence Theorems 


If the QD scheme exists, some remarkable statements are possible 
about the limits of its elements as m - οὐ. The simplest situation arises 
if the zeros of the polynomial p satisfy (8-8). We then have 


Theorem 8.3a Under the conditions just stated, 


(8-9) lim g@ =z, k= 1,2,...,., 


i.e., the kth g-column of the QD scheme converges to the Ath zero of 
the polynomial. 
It follows from (8-9) by virtue of (8-5a) that 


lim e = 0, 


T= OD 


and from this we get easily by induction 


(8-10) lim e = Q, k= 1,2)..:,;N— 1: 

Thus, under the condition (8-8), all e-columns of the QD scheme tend to 
zero. 

EXAMPLE 


3. In table 8.1 the column headed gi tends to (1 — V5)/2, the smaller 
zero of p(z) = Ζῇ -- 2 -- 1. (Concerning the column εἰ), see §8.4.) 


If several zeros of p have the same absolute value (this happens, for 
instance, every time when a polynomial with real coefficients has a pair of 
complex conjugate zeros), the convergence properties of the scheme are 
more complicated. We still assume that the zeros are numbered such that 
(8-11) [30 2 |Za| 2 [ΞΡ Ἐπ ΞΡ > 0. 

For convenience in formulating some of the conditions below, we shall put 
[Zo] = οὐ, ἱΖν αἱ = 0. 
Always assuming that the scheme exists, we then have 
Theorem 8.3b For every k such that |z,4,| < |z,| < 2x13 


(8-12) lim g® = z,. 


For every & such that |z,| > |Z: +1], 
(8-13) lim e) = 0. 


nh oo 


These facts can be used in the following manner: The e columns which 
tend to zero (a behavior which is numerically conspicuous) divide the QD 


the quotient-difference algorithm 167 


table into subtables. All zeros z,, whose subscripts agree with the super- 
scripts of the g’s in one subtable have the same modulus. Thus, if a z, is 
the only zero of its modulus, this will be evident from the fact that the 
corresponding subtable contains one g column only, and the value of z,, 
can be obtained as the limit of that g column. 

It is not yet clear how to deal with several zeros having the same 
absolute value. (Most frequently this situation occurs in connection with 
complex conjugate zeros of real polynomials.) Such zeros, too, can be 
obtained from the OD table. We first consider the general case where m 
ZETOS 2415 Ze49s+++s Zkem Have the same modulus: 


(8-14) |Z%e| > Zi αὶ] τὰ Zc (αὶ iii [Zi + m| > (Zim +a 


Here it is necessary to construct polynomials p{’,/ = k,k + 1,...,k +m, 
by means of the recurrence relations 


(8-15a) pr(z)=1, n=0,1,2,..., 


(8-15b) PZ) = σρίἰτ 112) — αῦρα 712). 
ἐξ δὴ 2 6s Se ae 


These polynomials can again be thought as being arranged in a two- 
dimensional array. Scheme 8.3 shows a segment of this array for m = 2. 


| —ghtb 
Ζ ρα 
i —gik*® 
| Ξ Pu 2 
Ρ 2) 
l ΡῈ ΡΩ) 
pete 
Ι 
Scheme 8.3 


The zeros Zn 43, Ze49s +++, Ze+m an now be obtained from the polynomials 
py *™ by virtue of the following theorem: 


Theorem 8.3c If the zeros satisfy (8-14), then for each fixed z 


(8-16) [πὶ pri? (2) = (Σ — Zar -- Ze+2)---(@ — Ζε εν) 


i.e., the coefficients of the polynomials p**™ tend for n—> οὐ to the 
coefficients of the polynomial with zeros z,4),..., Ze4m and leading 
coefficient 1. 


168 elements of numerical analysis 
For m = 1 theorem 8.3c reduces to relation (8-3a). For m = 2 (the 
practically most frequent case) the converging polynomials are given by 


pete) = zlz — ght] — gle — git?” 
a {Ὁ 5... + ge )2 + 3 


Relation (8-16) here means that the limits 


(8-17a) lim (qh th + git?) = Ag, 
(8-17b) lim gy * gn"? = δ 


exist, and that the polynomial z* — A,z + B, has the zeros z,,, and 


2h +2 
We finally mention the following fact, which could (and will) play the 


role of a computational check. 


Theorem 8.3d The quantities οἰ) caleulated from (8-5a) with k = Ν 
are identically zero. 


Examples illustrating the above theorems will be given in §8.5. 


Problems 


7. Prove theorem 8.3a in the case of a polynomial of degree N = 2 whose 
zeros satisfy |z,| > |z2| > 0. 
8. Show that 
ἡ απὸ = ᾿Ξ δι + δ = te 
n+ n+2 
where D, and £, denote the determinants defined by (7-22) and (7-23). 
Thus conclude that in the case z; = 2. theorem 8.3c is equivalent to 
theorem 7.5. 
9, Obtain the two dominant complex conjugate zeros of the polynomial 


p(z) = 8lz* — 108z° + 242 + 20 


from the OD scheme by using the relations (8-17) with k = 0. 
10. Assuming that the quantities x, are given by (7-17), prove theorem 8.34 
for polynomials of degree N = 1 and N = 2. 


8.4 Numerical Instability 


As described above, the OD scheme is built up proceeding from the left 
to the right. The sequence {x,} determines the first g column {g;,"}; from 
it we obtain in succession the columns {fe}, {g@},..., by means of the 
relations (8-5). The reader will be shocked to learn that this method of 


the quotient-difference algorithm 169 


generating the QD scheme is not feasible in practice, because it suffers 
from numerical instability due to severe loss of significant digits. 

The concept of numerical instability was already briefly alluded to in 
gl.5. In mathematical analysis, real (or complex) numbers are always 
conceived as being determined with infinite accuracy. Mathematically, 
this infinite accuracy is expressed by the Dedekind cut property (see 
Taylor [1959], p. 447; Buck [1956], p. 388). Numerically, the real 
numbers of mathematical analysis can be thought of as infinite decimal 
fractions. In computation a real number z is—with rare exceptions— 
never represented exactly, but rather approximated by some rational 
number z*, e.g., by a decimal fraction with eight decimals.+ The quantity 
z* — zis called rounding error, or sometimes also absolute rounding error, 
in order to distinguish it from the relative rounding error |z* — 2] /\z\. 
The following simple rules concerning the propagation of rounding errors 
follow easily from first principles: 

(1) A sum or difference of two rounded numbers αἴ and ΡῈ has an 
absolute rounding error of the order of the sum of the absolute errors of 
a* and b*, 

(ii) A product or quotient of two rounded numbers a* and ΡῈ has a 
relative error of the order of the sum of the re/ative errors of αὐ and b*, 

The QD scheme offers an interesting illustration of these rules. To 
simplify matters, let us assume that the sequence {x,} is generated without 
rounding error (this is possible if p is a polynomial with integer coefficients 
and leading coefficient 1), and that the situation covered by theorem 8.3a 
obtains. If we carry ¢ decimals, the numbers αι then will have (absolute) 
rounding errors of the order of 10°‘. The elements εἰ), formed by 
differencing the gj”, will by rule (/) have absolute rounding errors of the 
Same magnitude. However, since by (8-10) the εἰ) tend to zero, their 
relative errors become larger and larger. Thus, by rule (ii), the ratios 
ΟΡ Ὁ αἰ εἰ) are formed less and less accurately, and, again by rule (ii), the 
relative error of the quantity φῦ), as determined by (8-5b), increases with- 
out bound. Since qi” tends to a nonzero limit as n—> oo, the same is 
true for the absolute error of gi. It is clear that this numerical instability 
becomes even more pronounced as k increases. 


EXAMPLE 

4. The loss of significant digits is illustrated already in example 2. By 
theorem 8.3d, the column e? should theoretically consist of zeros. The 
fact that these elements are not zero, and even increase with increasing n, 
Shows the growing influence of rounding errors. The reader is asked to 
en HSS Ee SS nF Ey sm PS | Sl το 
7 For more details the reader is referred to chapter 15. At the present stage we aim 
at a qualitative rather than quantitative understanding of rounding errors. 


| f 
| | 170 elements of numerical analysis the quotient-difference algorithm 171 
| 


| | calculate the same example using numbers with ten decimals. This will (8-19b) οἴ) - κει EL { eels 4 
| Ὶ delay, but not ultimately prevent, the phenomenon of numerical instability. | gti a ae a δὲ 
| The fact that the method of generating the Q D scheme described in §8.1 Consider the elements thus generated as the first two rows of a QD 
Ἵ is unstable does not, of course, prevent the theorems stated in §8.3 from scheme, and generate further rows by means of (8-18), using the side | 
qi being true. These theorems concern the mathematically exact, unrounded conditions | 
ἢ] ΟΡ scheme. They can be used numerically as soon as we succeed in | : | 

generating the QD scheme in a numerically stable manner. One method (8-20) en” = en = 0, n=1,2,.... 


As before, there is the theoretical possibility of breakdown of the scheme 


coefficients, would be to perform all operations in exact rational arithmetic fine te ihe fact that a duntaninaor ὁ sero, °-Pbtever we heave 
, | " ὦ ΤΩ : . : 2 


| (see Henrici [1956]). Fortunately, as will be seen in the following section, 


it there is a simple way of generating a stable OD scheme also in conventional Theorem 8.5 If the scheme of the elements g{ and e defined by 
arithmetic. algorithm 8.5 exists, it is (for 7 = ΟἹ identical with the OD scheme of ] 


| the polynomial 
8.5 Progressive Form of the Algorithm | | 
P(Z) = aoz™ + ayz¥~* +---+ ay, | 


| 
} of avoiding rounding errors, applicable to polynomials with rational 
The OD scheme can be generated in a stable manner if it is built up 


᾿ row by row instead of column by column, as in §8.1. To see this, we where the sequence {x,} is started in the manner (8-3). 

ΠῚ solve each of the recurrence relations (8-5) for the south element of the 7 ; : 
} | chowibus involved: The proof of theorem 8.5 requires some involved algebra and it is | 
| | omitted here. A necessary and sufficient condition for the existence of | 
| Ϊ (8-188) Ba sf? τὰν eee hich git the scheme defined by algorithm 8.5 is that the determinants (8-7) be 

ἡ | we Ys different from zero also forn > —k. (Here we have to interpret x, = 0 

"il (8-18b) et) = qa eo, forn < 0.) 

n+l 
Ϊ | | Let us assume that a row of g’s and a row of e’s is known, each affected by EXAMPLES 
Ι ] “normal” rounding errors of several digits in the last place. The new | 5. The top rows of the scheme for N = 4 Jook as follows: 


} | row of q’s, as calculated from (8-18a), will then, by rule (/), have absolute 
1M errors of the same magnitude. The fact that the relative errors in the e’s 


; ᾿ ay 
| are large (due to the smallness of the e’s) is not important now. Further- ms 0 0 0 
| more, the relative errors in the new row of e’s determined by (8-18b) are as Bs δι 

i ] of the same order as in the old row of e’s, due to rule (). While a normal | 0 αι κα τῇ 
i amount of error propagation must be expected also in the present mode 4. “Ὁ ; q® : 7. y 
| | | of generating the scheme, it is much less serious than when the scheme is : 0 et ee) (3), 

| generated column by column. | g? q? 49 4.3 

β τ ep ee ef 

| | somehow be obtained. The following algorithm shows how this 15 δ τος | 

| accomplished. 

\\| Algorithm 8.5 Let ap, a), . .., @y be constants, all different from zero. 6. For the polynomial | 


Set p(z) = 128z* — 256z? + 16022 — 32z + | | 


: ay ας ; 
i| (8-19a) qx’ = ae αι = 9, k = 2,3,...,.N; We obtain the scheme shown in table 8.58. 


| If the scheme is to be generated row by row, a first couple of rows must | 0 
| 
| 
| 
Ι 
| 


elements of numerical analysis 


Table 8.5a 


qin? 


eo 


a 
gq? 


ef 


4 
ge 


2.000000 
1.375000 
1.181818 
1.093750 
1.046154 
1.017857 
1.000000 


| 


988277 
980372 
974940 


.625000 
193182 


— .088068 


047596 


— 028297 


0178357 
011723 
.007906 
— .005432 
— .00378 1 


7. For the polynomial 


σὰ = 0.69134, 


4.3) eo) 
000000 

— .200000 
425000 

— 079412 
538770 

— 035725 
591114 

— 016754 
.621956 

— 007915 
642337 

— 003718 
.656476 

— 001730 
.666468 

— .000798 
.673576 

— 000365 
.678643 

— 000] 66 


.Οὐθθθῦ 
.168750 
242375 
277215 
293848 
301748 
305464 
307194 
307992 
.308356 


The g columns converge to 


24 = 0.30866, 


— 031250 
— 005787 
— 000884 
— 000121 
— ,000016 
— .000002 
— .000000 
— ,O00000 
— ,Q00000 
— .Q00000 


000000 
031250 
.037037 
037921 
038042 
.038058 
.038060 
038060 
.038060 
.038060 


All e columns tend to zero, thus by theorem 8.3a the polynomial has four 
zeros whose absolute values are different. 
the zeros, whose exact values are as follows: 


24 a 0.03806. 


p(z) = Σ' — 82° + 3927 — 622 + 50 


algorithm 8.5 yields the following QD scheme: 


| Table 8.56 


(1) 
Tk 


\| q 


8.000000 


3.125000 

— 2.000000 
18.580000 
4.663079 
640381 

— 33.027030 


el) 


— 4.875000 
— 5.125000 
20,580000 


—13,916921 


— 4.022497 


— 33.667611 


rr 

000000 
3.285256 
8.031220 


— 12.564451 


1.347799 
5.361559 
39,.027201 


el?) 


— 1.589744 
— .379037 
— 015670 
— .004671 
— 08737 
— 001970 


000000 
783292 
332033 


— 3,745220 
2.521060 


1.208612 


347820 


1.652009 


We now have εἰ2) > 0, but e and ef do not tend to zero. This:is an 


the quotient-difference algorithm 173 


indication that there are two pairs of complex conjugate zeros. To make 
use of theorem 8.3c, we form the quantities 
4 - ght PD 4 gikt2, BW = ght Ugkk+2 


for k = 0 and k = 2, obtaining the following values: 


Table 8.5c 
A®) BY Ai?) cok Bo 

6.410256 26,.282051 1.589744 0) 
6.031220 25,097561 1.968780 1.282051 
6.015549 25.128902 1.98445] 1.902439 
6.010878 25.042114 1.989122 1.992225 
6.002141 25,.001372 1.997859 1.989741 
6.000171 25,0001 10 1.999829 1.996637 


The limits are 6, 25, 2, 2 respectively, indicating that the polynomials 
25 -- (2 25 and z* —2z+2 
are quadratic factors of the given polynomial. In fact, 
(2* — 6z + 25)(z* — 22 + 2) = ζ΄ — Bz? + 392? — 622 + 50. 


We have yet to deal with the complication that arises if some of the 
coefficients a, d;,..., ἂν are zero. In that case the extended OD scheme 
defined by algorithm 8.5 clearly does not exist, since some of the relations 
(8-19) may fail to make sense. A possible remedy is to introduce a new 
variable 

ΞΡ ΞΕ Fg 


and to consider the polynomial 


p*(z*) = pla + 2*) 
| # | a“ 1 j 
= pla) + ap az* + 5 p'(az*? Ὁ... + paz”. 


Here a denotes a suitably chosen parameter. The coefficients of the 
polynomial p* can easily be calculated by means of algorithm 3.6. It can 
be shown that if p has some zero coefficients, then all coefficients of p* 
are different from zero for sufficiently small values of a #0. If the 
zeros z; of p* have been computed, those of p are given by the formula 


ah ΞΞ ze + a. 
EXAMPLE 
8. Let p(z) = 81z* — 108z° + 24z + 20. Here a, = 0, and algorithm 


] | Ϊ 174 εἰεπιοπίβ of numerical analysis the quotient-difference algorithm ὀ 175 


Ι 8.5 cannot be started. We form p* with a = 1. Scheme 3.6 turns out results given below may be used for checking purposes; failure of any of Ϊ 


i Hinks « these checks indicates excessive rounding error. | 
| 4] 108 0 24 )0 (1) Relation (8-58) implies that the sum of the g values in any row of the | 
qh of SARL: aes τὸν ΟΕ σεν scheme is constant. For the top row this sum is —4a,/a), which by Vieta’s | 
' Hi 81 ‘Setuollol παϊήγεῖδο 4 | formula (see §2.5) equals the algebraic sum of the zeros of the polynomial. | 
ii | 81 54 21 24 | Thus we have | 
ἡ 81 1353. 162 | 
i | ae; (8-21) Ὁ φῶ, 1. Gna. = —S. 
| 1 81] Qo 
i 
ἢ | The new polynomial Note that by theorem 8.3a this relation confirms Vieta’s rule for ἢ —~ oo! 
"ἢ Ϊ p*(z) = 8128} + 2162 53 + 162z*? + 242* + 17 (ii) It can be shown that the product of all g elements in any diagonal 
i sloping downward is likewise independent of n and equals the product of | 
| has all coefficients different from zero, and the first rows of its OD scheme all zeros of the polynomials p. Thus again by Vieta, 
Lil are as follows: 


11. Construct the QD scheme for the polynomial 
p(z) = 3223 — 4822 + 18z -- 1 


(iii) If the OD scheme is generated by algorithm 8.5, the quantities x,, 
are not needed. However, we may calculate the x, from (8-2) and should 


tls; 
ae 0 0 0 (8-22) Ynge, ἐπ λυ = . 
0 aoe os 4 0 It should be noted that while (8-21) checks only the additions and sub- 
| 3 tractions performed in constructing the scheme, equation (8-22) checks all 
Problems operations. 
| find 


] and determine its zeros to four significant digits. Check your result by "ε 


] observing that z τὸ 4is ἃ zero. (The convergence of the g columns may (8-23) μαι _ g®, ΓΕ Υ ee 
i | | be sped up by Aitken’s A?-process.) | ὅπ ᾿ 
12. Determine approximate values for the zeros of the polynomial rE 
ih p(z) = 70z* — 1402 + 90z* — 20z + 1. | Problem 


iI} ’ Then find ΠΌΡΕ SIEGE: SEUSS ἢ a ἀΒθΙ 15. Prove (8-22) for the QD scheme arising from a polynomial of degree 

Wh) 13. The polynomial i a> 

| | P(z) = z® — 372" — 2023 + 60z? — z — 78 [ 

has two large real zeros of opposite sign, a pair of complex conjugate Ϊ 8.7 
zeros, and a small real zero. Find approximate values for the quadratic 
factors belonging to the two large real zeros and to the pair of complex 


QD versus Newton 


In comparison with other methods for determining the zeros of a 


| 
ἢ show that algorithm 8.5 generates the correct values of (δ᾽), eo”, and qo”. 
| 


zeros. 7 polynomial, the QOD algorithm enjoys the tremendous advantage of 

ἢ) 14. Prove theorem 8.5 for polynomials of degree N = 2. [Hint: It suffices to furnishing simultaneously approximations to all zeros of a polynomial. 

| No information about the polynomial other than the values of its 
|| Coefficients is required. 

| il 8.6 Computational Checks These advantages have to be paid for by the rather slow convergence of 

| | Even if the QD scheme is generated by algorithm 8.5, excessively large the algorithm. Since the OD method contains the Bernoulli method as a 


Special case, the convergence can be no better than that of Bernoulli’s 


(or small) elements may cause some loss of accuracy. The mathematical 


176 elements of numerical analysis 


method. In fact it can be shown that under the hypotheses of theorem 
8.3b the errors δ᾽ — z, tend to zero like the larger of the ratios 


oe a ee he 
( : and (22) 
2k —1, Zk 


Even if the figures in the g columns eventually settle down, the accuracy 
of the zeros thus obtained is somewhat uncertain, because the large 
number of arithmetic operations may have contaminated the scheme with 
rounding error. 

For the above reasons, the OD algorithm is not recommended for the 
purpose of determining the zeros of a polynomial with final accuracy. 
Instead, the following two-stage procedure is advocated: 

Stage 1: Use the QD algorithm to obtain crude first approximations to 
the zeros, respectively to the quadratic factors containing complex 
conjugate zeros. 

Stage 2: Using these approximations as starting values, obtain the zeros 
accurately by Newton’s or Bairstow’s method. 

This combination of several methods has the advantage that the final 
values of the zeros are obtained from the original, undisturbed polynomial, 
and thus are practically free of rounding error. 

The choice of the point at which to make the change-over from QD to 
Newton-Bairstow is, to some extent, arbitrary. It is probably best to 
carry the QD scheme to a point where the division of the scheme into 
subschemes in the manner described after theorem 8.3b is clearly evident. 
On the other hand, if OD is pushed too far, a lot of computational effort 
may be wasted, since Newton-Bairstow usually takes only two or three 
steps to obtain the zeros very accurately even from mediocre first approxi- 
mations. To fuse the three algorithms into one working program 
constitutes a challenging but rewarding problem in machine programming 
which is highly recommended to the reader. One such program is 
described by Watkins [1964], who also presents the results of extensive 
machine tests. 


Problem 


16. By combining QD and Newton-Bairstow, compute all zeros of the 
following polynomials with an error of less than 107°: 


(a) p(z) = z* — 813 + 3923 — - 622 + 51; 
(b) p(z) = z® — 151 + 8529 — 2252? + 2642 — 120; 
(c) pz) = 425 = 52° 4- 4z* — 325 + Tz" — fz + I. 


8.8 Other Applications 
In addition to the calculation of the zeros of a polynomial, the QD 


Magers 


the quotient-difference algorithm 177 


algorithm has many other applications, notably in the theory of continued 
fractions, in matrix computation, and in the summation of divergent series 
(see the references given below). It can also be used to furnish exact 
bounds (and not only approximations) for the location of the zeros of a 
polynomial. We shall mention only one further application which 15 
related to the one discussed above. 

Suppose the function [15 defined by the power series 


(8-24) f@) = > a2" 


where a, # Ὁ. ἢ = 0, 1, 2,..., and let it be known that the zeros z,, of fare 
real and positive, Ὁ « z, < Zz. «""". 


EXAMPLE 

9. The Bessel function of order zero can be defined by 
7 — (—.x)" 

(8-25) J(2V x) = Σ Ge 


and has the required properties. 


Obviously the QD scheme of such a function cannot be found in the 
ordinary way, because the horizontal rows are now infinite. However, 
the scheme may be generated in the following manner, as indicated by the 
arrows: 


ay 


es 0 0 
ag 
as ἃς αι 
ay ay is 
ral wv κ΄ 
a qs” gq?) 
v v 
ee iv 
as W? 
"4 
Τὼ 
μ΄ 
a 


178 elements of numerical analysis 


| Although the scheme does not terminate on the right, more and more 
| | ' diagonals sloping upward can be found. It can be shown that if the 
yi] scheme exists, 

Ι 

lim αἕ = --, ΞΕ ον κα 

ἢ ἢ -- oo le 
Hi | 
| Under certain conditions even complex conjugate zeros of transcendental 
functions can be found in this manner, using the method described in 
theorem 8.3c and in example 7. 


ἢ EXAMPLE 
10. For the function defined by (8-25) the coefficients a, evidently satisfy 


a, ~ ἜΤ ἢ 
The first few diagonals of the OD scheme thus appear as follows: 


| Qy41 1 


\ | 1.000000 .000000 000000 000000 000000 
i — .250000 on PEATE — .062500 — ,040000 
Ἵ .750000 138889 048611 022500 012222 
| | — 046296 — 038889 — ,028929 — 021728 
i .703704 146296 058571 029700 
Ι — 009625 — 015570 -- 014669 
ἢ .694079 .140351 059472 
| | — 001946 — 006597 
i 692133 135700 
it - 000382 
| 691751 
. : ‘ 
| 691660 131271 
| | Problems 


| 17. Obtain an approximate value of V 2/m by applying the algorithm described 
I above to the function 

| 

| 


> heer 
cos Ma 2 nt, 


Ϊ 18. Apply the above version of the QD algorithm to the problem of finding 
the small solutions of the transcendental equation 


| tan z = cz 
ἤ] by setting 


sin Zz 
f(z?) = oa — ¢ Cos Zz. 


Find the smallest positive solution for c = 1.2, and a pair of purely 
imaginary solutions for c = 0.8. 


the quotient-difference algorithm 179 


Recommended Reading 


The QD algorithm was introduced by Rutishauser in a series of classical 
papers which are collected in the volume Rutishauser [1956]. A somewhat 
more elementary treatment is given by Henrici [1958]. A multitude of 
applications are discussed in Henrici [1963]. 


Research Problem 

How can the argument of a complex zero of a real polynomial be 
determined from the signs of the elements of the corresponding g column? 
Consider the case N = 2 first. 


PART TWO 
INTERPOLATION 


AND 


APPROXIMATION 


chapter 9 the interpolating polynomial 


| So far we have been concerned mainly with the problem of approximating 
| numbers (such as the zeros of a polynomial or the solutions of systems of 
| nonlinear equations). We now turn to the problem of approximating 


functions and, more generally, numbers, such as derivatives and integrals, 
that depend on an infinity of values of a function. 

| The most common method of approximating functions is the approxi- 
HI ‘mation by polynomials. Among the various types of polynomial approxi- 
| ‘mation that are in use the one that is most flexible and most easily 


‘constructed (although not always the most effective) is the approxima- 
tion by the interpolating polynomial. 


| 91 Existence of the Interpolating Polynomial 


| by Let the real function Κ be defined on an interval /, and let x9, .%,..., Xp; 
be n + 1 distinct points of J. It is not assumed that these points are 


equidistant, nor even that they are in their natural order. 


| Ϊ We shall write for brevity 
| Sn) =fe  k=0,1,...,0. 
᾿ 


Theorem 9.1 There exists a unique polynomial P of degree not 
| exceeding ἢ (the so-called Lagrangian interpolating polynomial) such 
| | that 


(9-1) PO Pep ee SUAS τῇ, 


| Proof. As usual, the proofs of existence and of uniqueness require 
separate arguments. The existence of the polynomial P is proved if we 
183 


184 elements of numerical analysis 


can establish the existence of polynomials L, (k = 0,1,..., n) with the 
following properties: 


(i) Each L, is a polynomial of degree <n; 
(ii) For x = x,,, L, has the special value 


(9-2a) LAX.) = 1; 
however, if m τέ k, then 
(9-2b) L(Xm) = 0. 


Assuming the existence of these L,, we can set 


9.3) P(x) = > fulals 


The function P is a sum of polynomials of degree <n with constant factors 
and thus itself a polynomial of degree <n. Furthermore, if we set 
X = Xm, then by (ii) all L, are zero except for the one with k = m, and this 
has the value 1. Thus we find 


P(Xm) = Sins 


as required. 
The polynomials L,, are called the Lagrangian interpolation coeflicients. 
To prove the existence of L;,, observe that the product 


— ἃ — Xm we (% -- Xp)... .«(Χ — Xp - 1 )(X στ Xie +1). (X= Xn) | 
ap ἵκ τ πὶ (χε — Χο)... «Οὐ — κοὐχ. το χκ}.. Oy — Xn) 


mic 


has the required properties. Indeed, as a product of n+1—l=n 
linear factors it represents a polynomial of degree n: furthermore, if 
x = x,, then all factors have the value 1, thus the product has the value 1 
also. On the other hand, if x = x,,, where m τέ k, then the factor con- 
taining x — x, is zero, and the product vanishes. Thus, the polynomials 


a - — 
(9-4) Live Tl) Ξ 
pep te a 


have the required properties, and the existence of the polynomial P is 
proved. 

In order to show the uniqueness of the interpolating polynomial, assume 
there exist two interpolating polynomials, P and Q, say. Then their 
difference D = P — Q, being the difference of two polynomials of degree 
not exceeding ἢ, is again a polynomial of degree not exceeding n. More- 
over, 


D(x.) = P(xx) — OC.) = fe — Se = 9 


the interpolating polynomial 185 


fork =0,1,...,”. The polynomial D thus has ἡ + 1 zeros and hence, 
being a polynomial of degree <n, must vanish identically. It follows that 
P= Q. This completes the proof of theorem 9.1. 


EXAMPLE 

1. To find the interpolating polynomial for the following x, and ἢ: 
Xe 2 3 —| 4 
fh i ἢ Sd 


We first calculate the Lagrangian interpolation coefficients. Formula 
(9-4) yields 


(x — 3)(x + 1)(x -- 4) m 


Lofx) = SE ERD) = Ye — Ne + NW - 9, 
L(x) = SAC OAD = ace 26 + 06 - 4), 
2 @+ BE = DOH OA one ee 
L(x) = meh=eaes τ ox — 2x — 3) -- 4), 
πιρὴ = FAIA DED χα — )ὰ - 36 + 0. 


Formula (9-3) thus yields 


P(x) = 1 — 3)(x + I -- 4) — 4x — 2)(x + IG - 4) 
— sig(x — 2)(x — 3)(x — 4) + 2x — 2)  -- 3)(x + 1). 


It can be verified that this polynomial has the required properties. 


It will be noted that the representation of the interpolating polynomial 


given in the proof of theorem 9.1 does not give the polynomial in the 


customary form 
P(x) = GoX" + ayx"7) +--+ +4 a. 


Of course, the polynomial could be put into the above standard form, but 


there usually is no particular reason for doing so. _ It is well to distinguish 
at this point between the function P and the various representations of P. 


As a function (i.e., as a set of ordered pairs (x, P(x))), P is unique. How- 


€ver, there may be many ways of representing P by an explicit formula. 


Each formula suggests a certain algorithm for calculating P. It is not 
Claimed that the algorithm suggested by (9-3) is the most effective from 
the numerical point of view. Many other algorithms for constructing 
the polynomial will be discussed in the chapters 10 and 11. 


Problems 


1. Verify that the case n = 1 of (9-3) yields the familiar formula for linear 
interpolation. What is the meaning of theorem 9.1 when n = 0? 


186 elements of numerical analysis 


2. Is the interpolating polynomial constructed above always of the exact 
degree n? 

3. Show that for x τὸ χε, kK = 0, 1,..., ἡ, the interpolating polynomial can 
be represented in the form 


: Κ 
ΡΟ) - LO) Σ Ge 


where 
L(x) = Go— xo) — χι)... (ὦ -- χρ). 


Verify by L’Hopital’s rule (see Taylor [1959], p. 456) that the limit of the 
expression on the right as x —> Xm 1S fm- 
4. Prove: If fis a polynomial of degree ἢ or less, then P = ¥. 


9,2. The Error of the Interpolating Polynomial 


Since we wish to use the interpolating polynomial to approximate the 
function f at points which do not belong to the set of interpolating points 
X,, We are interested in estimating the difference P(x) — f(x) for xel. 
It is clear that without further hypotheses nothing whatever can be said 
about this quantity. For we can change the function f at will at points 


which are not interpolating points without changing the polynomial P 


at all (see Fig. 9.2). 
A definite statement can be made, however, if we assume a qualitative 
knowledge of the derivatives of the function ἢ. 


the interpolating polynomial 187 


Theorem 9.2 In addition to the hypotheses of theorem 9.1, let f be 
n+ 1 times continuously differentiable on the interval 1. Then to 
each x EJ there exists a point €, located in the smallest interval 
containing the points αὶ, Xo, X1,..., X, such that 


. Ι 

(9-5 = Pfy) —] ——__ Jy) fri 

(9-5) Fe) ~ PO) = Gy LOO Es) 

where 

L(x) = Ὁ" — Xo)(x — X41)... — X,). 

Proof. If x is one of the points x,,, there is nothing to prove, since both 
sides of (9-5) vanish for arbitrary €. If x has a fixed value different from 
any of the points x,, consider the auxiliary function F = F(t) defined by 


(9-6) F(t) = f(t) — P(t) — τῇ, 
where 
LG) = PO), 
L(x) 
We have 


F(x.) = F(X.) — ΡΟ — cL(xx) 
=Jn—Je — 9 
= ᾧ, πος ἢ 
and also 
F(x) = f(x) — P(x) — cL(x) = 0, 


by the definition of c. The function F thus has at least ἢ + 2 distinct 
zeros in the interval J. By Rolle’s theorem, the derivative F’ must have 


at least ἢ + 1 zeros in the smallest interval containing x and the x,, the 


second derivative must have no less than ἢ zeros, and finally the (n + 1)st 
derivative must have at least one zero. Let ἔς be one such zero. Wenow 
differentiate (9-6) n + 1 times and sett = €,. The (ἢ + 1)st derivative of 
Pis zero. Since L is a polynomial with leading term x"*+, the (n + 1)st 
derivative of cL is c(n + 1)!. We thus have 


Ὁ = Pree) = fr Me.) - cn + 1)! 


or, remembering the definition of ὁ and rearranging, 


eL(x) = Ὁ) -- PQ) = Gay LOO", 


as was to be shown. 


Equation (9-5) cannot be used, of course, to calculate the exact value of 
the error f — P, since €, as a function of x is, in general, not known. 
(An exception occurs when the (# + 1)st derivative of jf is constant; see 
below.) However, as is shown in the examples below the formula can 


188 elements of numerical analysis 


be used in many cases to find a bound for the error of the interpolating 
polynomial. 
We also shall require the following fact. 


Corollary 9.2 Under the hypotheses of theorem 9.2, the quantity 
f™*(é,) in (9-5) can be defined as a continuous function of x for 
xed, 
Proof. Define the function g = g(x) by 
f(x) — P(X) te ae ‘eo | all ee 
(x) L(x) 
= ᾿ 
: F(X) - My (Xx). 
L'(Xx) 
This function is continuous for x # x, and, by L’Hopital’s rule (see Taylor 
[1959], p. 456) also at the points x,. For x τὰ x;, 
fer ME.) = (n + Ig), 
establishing the corollary. An application of this corollary will be made 
in the chapters 12 and 13. 


AS Xe, Rowe OF τε τυθΝΣ 


EXAMPLES 
2. One interpolating point. If there is only one interpolating point xp, 
the interpolating polynomial reduces to the constant fp. Formula (9-5) 
yields 

F(x) — £0) = αἱ — Xo) Γ (Ex), 
where ἔς lies between x, and x. This is the familiar mean value theorem 
of the differential calculus. 
3. Two interpolating points. The linear interpolating polynomial is 
given by 
Oy X) fo + (% — Xo) A 


Pix) = an 


Equation (9-5) yields the error formula 


X — Xo)(X — X41) 
πὸ — Pla) = 2 = Be — τὸ κι. 
What is the maximum error that can occur if we know that |f"(x)| = Mz 
and x is between x, and x,? The maximum of the function 
τ — Xo(x -- x,)| 

between x, and x, occurs at x = 4(xp + Χο) and has the value {(x, — Χο)". 
Thus we find : 

7@) — P@)| s 52" μ, 


the interpolating polynomial 189 


in this case. Application: If we calculate the value of sin x from a sine 
table with step A, using linear interpolation, the error is bounded by 
th’, since Mz, = 1 in this case. 
4. Error in cubic interpolation. We assume that the four interpolating 
points are equidistant, x, = xX) + kh, k = 1, 2,3, and that x (the point 
where the value of the function fis sought) always lies between x, and xp. 
We set 

M,= max |f@(Q)|. 

tg Ststs 

The interpolation error is then bounded by (1/4!)M, times the maximum 
of the absolute value of the function 


L(x) = (% — Xo)(x — xi)(x - χοῦ — Xz) 
in the interval x, = x = xj. For reasons of symmetry this maximum 
occurs at x = (x, + %.)/2 and has the value 
[BMBA)P = Pht. 

It follows that the interpolation error is bounded by 

ὥς 

128 
In a sine table, for instance, using cubic interpolation and a step as large 
ash = 0.1 we get a maximum error of less than 2.5 x 107°. 


h* M4. 


The basic error formula (9-5) requires the knowledge of a bound of the 
derivatives of the function Κα Such bounds can often be obtained very 
easily even for non-elementary functions by exploiting known functional 
relations. 

EXAMPLE 
5. The Bessel function of order zero can be defined by 


| pr 
ΠΧ} = [ cos (x sin ἢ dt. 
WT Jo 
By differentiating under the integral sign, 


σ΄ : 
Jo(x) = ἘΞ | sin 1 sin (x sin ἢ) df, 
0 


Jal) = -: [ (sin f)? cos (x sin ἢ) dt, 


etc. The integrands of all integrals which we obtain by differentiation are 
bounded in absolute value by 1. Thus 


F209] «τ ἘΠ gale cee ιν 
o 


190 elements of numerical analysis 
Problems 


5. A table of a function of one variable is well suited for linear interpolation 
if the error due to interpolation does not exceed the rounding error of the 
entries. What is the greatest permissible step of such a ‘well inter- 
polable” table of cos x as a function of the number of decimal places, 
(a) if x is given in radians; (b) if 15 given in degrees? Make a survey of 
some tables accessible to you and decide whether they are well suited for 
linear interpolation. 

6. What is the maximum value of the combined error due to linear inter- 
polation and rounding of the formula 


f(x) ~ 


A — Xp Ao & 
Xs poe a x1 


A, ἘΞ 
if f, and fp are known to N places, and if the products are rounded to N 
places? (It may be assumed that the fractions (x — x9)/(x,; — xo) and 
(x; — x)/(%1 — Xo) are exact decimal fractions.) 

7. The function log,, (sin x), where x is given in degrees, is tabulated to five 
decimal places with a step of 1/60 of one degree. From what value of x on 
is this table well suited for linear interpolation? 

8. The function f(x) = Vx is tabulated at the integers, x = 1, 2,3,..., 
giving four decimals. From what x on is this table well suited for linear 
interpolation ? 

9. The Bessel function of order ἢ can be defined by 


, 1: i yi 
A(x) = =| cos (x sin ὦ — πῇ dt. 


How do we have to choose the step / of a table of J, so that the error is 
less than 107° 7 
(a) if linear interpolation is to be used? 
(b) if cubic interpolation (as described in example 4) is to be used? 
10. Interpolation near the end of a table. The fourth derivative of a function ἢ 
is known to be bounded by !,. Let P be the polynomial interpolating 
fat the points x, = kh, k = 0,1, 2,3. Give the best possible bound for 


max [fG) — PO). 
11. Theorem 9.2 implies that if [f@*()| S Μ,.., ΧΕΙ, then, in the 
notation of theorem 9.2 
Ma +1 


Lf) — PG) Ξ ia + ΤῊ |L(x)|, xed, 


Are there any functions f (and corresponding points in 7) for which this 
inequality becomes an equality? 
12. Let ἢ be a positive integer. Somebody proposes to calculate the value of 


the interpolating polynomial 191 


e"** by constructing the polynomial P interpolating the function f(x) = e* 
at the points x = 0, 1,...,” and evaluating P for x = πα + 1. 

(a) Indicate a /ower bound for the error e"*? — P(n + 1). 

(Ὁ) Determine the number &, of theorem 9.2, and thus obtain an exact 
expression for the error. 

The function fis defined on [0, 1] and is known to have a bounded second 
derivative. Its values are to be computed from a fixed interpolating 
polynomial using two interpolating points x» and x,;. How should one 
place the points xp and x, in the interval [0, 1] in order to minimize the 
error due to interpolation 7 


13 


9.3 Convergence of Sequences of Interpolating Polynomials 


Let the function / be defined for -- οὐ < x < oo, and let it and all its 
derivatives be bounded by one and the same constant, 


FfO)| = M, 


Assume one wishes to calculate f(x) in the interval [0, 4] by means of the 
interpolating polynomial P of degree 2n — | using the interpolating points 


Oa 9 OO eee, 


Xo = 0, x1 = ἢ, 
Vg = —hi, 3 = 2h, 
Aon-2 = —(n = LA, Non-1 = nh, 


For what values of / does the error tend to zero as ἢ —> οὐ 7 

Obviously, the above procedure cannot be effective for unrestricted 
values of ἢ, as the example f(x) = sin x, = wshows. (All interpolating 
polynomials are zero in this case.) By theorem 9.2, the interpolation 
error of P is bounded by A4|L(x)| /(2)!, where 


L(x) = [x + ὦ — DA[Lx + (ὦ — 2A)... [x — nA]. 


The maximum of the function |£(x)| on the interval [0, A] occurs at the 
point x = h/2,. At this point, 


h\| _ [ὦ — dh — 9h... 5hP 
(3 ἐπ πον νπα νοὶ Ὁ 3} srinoree 
(Ga Dn 5.3}...13 138,85 


] 
(2n)! 


2"! 


Bel on, 
TG 


192 elements of numerical analysis 
Using Stirling’s asymptotic formula for n!}, 


nt 0 Se (=) 
(see Buck [1956], p. 159) we find 


i ceed FH Peseta erst eas mere, eH ἘΚΟΘΝ ΤΣ he" 
1 2n 
(2n)! Δ fia (* 


The last expression tends to zero ifm -- oo ifand only if |h| Ξ 2. Thus the 
convergence of the interpolation process described above can be guaranteed 
only if |A| Ξ 2. 

The sequence of interpolating polynomials constructed above looks 
somewhat unnatural in view of the fact that we use interpolating points 
farther and farther removed from the interval where we wish to approxi- 
mate the function f. The following question, however, is very natural: 
Let f be continuous on the interval [0, 1] and denote by P,(x) the poly- 
nomial interpolating f at the points 


ss aa ee | oe Gee | 2 
Is it true that 
(9-7) lim P,(x) = f(x) 


for all xe [0,1]? An important result due to Runge states that there are 
continuous functions for which (9-7) does not hold. (A simple example 15 
ΤΑῚ = |x — 4.) Actually, the relation (9-7) even fails to hold for some 
functions which have derivatives of all orders. 

It is important, however, to understand Runge’s result correctly. The 
result does not mean that a continuous function cannot always be approxi- 
mated by polynomials. In fact, a famous theorem due to Weierstrass 
(see Buck [1956], p. 39) states that every f continuous on a closed finite 
interval 1 can be approximated by polynomials to any desired accuracy. 
Runge’s result merely states that these approximating polynomials can In 
general not be obtained by interpolation at uniformly spaced points. 


Problems 
14. Missing entry in a table. A function f is defined on the whole real line 
and satisfies 
ἢ τε, 22. oe. 


0} SM", 


—o <x < o; 


the interpolating polynomial 193 


for some constant M. Forn = 1, 2,... let Ps,—; denote the polynomial 
interpolating fat the points -- ἢ, -- + 1,..., —l,1,...,2 — 1,n. Prove 
that 


lim Pan-1(0) = f(0) 


holds provided that M47 < 2. 
15. A function fis defined for x = 0 and satisfies 


—_— 
--ἔ 


Leal 355. x 2 Ὁ, a 1.2... 


For a fixed value of h, let P,, denote the polynomial interpolating f at the 
points 0, ἢ, 2h,..., ah. For what values of A can you guarantee that 


lim P(x) = f(x) 


for every fixed value of x > 0? 

Let the function f be continuous on the interval [0, 1]. From the fact 
that such a function is uniformly continuous (see Buck [1956], p. 34) one 
can easily prove that fcan be approximated to arbitrary accuracy by the 
piecewise linear function coinciding with f at suitable points xo, x1,..., 
Xn. Thus, in a sense, f can be approximated arbitrarily well by linear 
interpolation. Why does this not contradict Runge’s theorem? 

A function fis defined on [0, 1], and its derivatives satisfy 


16* 


17 


Px) Sem a 0,152)... 0S es 4. 


(Example: f(x) = (1 + x)7?.) Let P, denote the polynomial inter- 


polating fat the points 1, g, g*,..., gq", where g is some number such that 
O<g< 1. Show that 


tim ι(0) = f(). 


9.4 How to Approximate a Polynomial of degree n by One of Degree n — I 
Let Ο be a polynomial of degree ἢ with leading coefficient 1, 


O(x) = x" + ἀρ. ΧΡ} +--+ + a. 


We wish to interpolate Q in the interval [—1, 1] by a polynomial P of 
degree ἡ — 1 such that the maximum of the error |Q(x) — P(x)| is 


minimized. How do we have to choose the interpolating points x9, +), 


“a,...;X,-1, and how large is the smallest possible maximum error? 


From the general error formula (9-5) we find, since Ox) =n, 


(9-8) Q(x) — P(x) = L(x) 


Where 


L(x) = (% — Xo) — x)... 0% — Xp) = AB He 


194 elements of numerical analysis 


Our problem is thus equivalent to the problem of selecting the points 
Xo, Xi,-++sXn-1 in such a manner that the quantity 
| τς, ὁ — τ χ})...1Χ.-- Xe) 
is minimized. Although seemingly difficult this problem can be solved 
explicitly. 
Theorem 9.4 The best choice of the interpolating points Xo, Δαν...» 
x, -, for the approximation of the polynomial Q(x) = x"+--- in the 
interval —1 <= x < 1 by a polynomial of degree ἢ — 1 is the choice 
for which 


(9-9) Lx) = 54 TaD, 


where 7,, denotes the nth Chebyshev polynomial, 


T,,(x) = cos (n arc cos x). 


Proof. Let us first convince ourselves that the function 1, defined by 
(9-9) really is a polynomial of degree n with leading coefficient 1. From 
the difference equation satisfied by the Chebyshev polynomials, 

T(x) = 2xT,,-1(%) — Ty-2(%) 
and from the fact that T(x) = 1, T,(x) = x it readily follows that Τ᾽, is a 
polynomial in x with leading coefficient 2"~*. Hence our assertion on L 
follows immediately. 

Since we are interested in minimizing the maximum of |Z(x)|, let us 
calculate the extrema of L(x). The extrema of cos x occur for x = kz, 
where k is an integer, hence the extrema of L(x) in the interval [—1, 1] 
occur at the points where n arc cos x = ka, 1.e., for x = fy, where 


ἢ = cos, foe Δ ΡΠ 


The values of 1, at these points are 
L{t,) = εἰ τον, 


i.e., the extrema all have the same absolute value 2-"+1. but oscillate in 
sign (see Fig. 9.4). 
ee ge ce ce i ον 


+ See example 3, chapter 6. 


the interpolating polynomial 195 


y=L(x) (n=9) 


Figure 9.4 The function L(x) (ἢ = 9). 


Now suppose there exists another polynomial M(x) = x" +--+ for 
which |M(x)| has a smaller maximum m < 2~"*? in [—1, 1]. Then the 
difference polynomial 


D(x) = L(x) — M(x) 
would at then + 1 points ἐς have the same sign as JL, i.e., 


>0, k even, 
(tx) 1 k odd. 


Since tf) > ft; > fg >-+:: > ἔμ it would follow that D has at least ἡ distinct 


zeros, namely one in each interval [t,.,, ¢,]. However, since both 1, and 


M have leading coefficient 1, it follows that D is a polynomial of degree 
Sn-— 1, hence D cannot have ἢ distinct zeros without vanishing 


identically. The assumption of the existence of a polynomial M with a 
Maximum deviation from O smaller than 2.5} thus has led to 
a contradiction. 


The interpolating points x, for the best approximation of Ὁ by a 


Polynomial of lower degree are the zeros of L, that is the points x = x, 
Satisfying 


narecos x = (k + 4)r, cea | Nag Smee ἢ “1. 


Tt follows that the interpolating points are given by 


2k + = 7) 
2n : 


Xp. -Ξ cos ( 


196 elements of numerical analysis 
EXAMPLE 
6. How can we best approximate the function 
f(x) =x? +ax+6 
in —| Ξ x Ξ | byastraight line? This is the casen = 2 of theorem 9.4. 


We have 
piteet. 82 Pa read _v2. 
Se eo hee ee NA eS 
The interpolating polynomial is given by 
P(x) = I (Xo)(%1 — x) + f(X1)(%o = x) 
XX, — Xo 
_ + ax + ῥ)ίχ — x) + FE + ax + ῥ)ίχο -- X) 
i Xo — ἃ ᾿ 
Ξε ὧχ τ δ 1 1. 
The maximum deviation from x? + ax + δ in [—1, 1] is 4, as predicted 
by the theory. 


It is not necessary to use the interpolation points x,, in order to construct 
the polynomial P of best approximation. From the error formula (9-8) 
we find, if Z is given by (9-9), 


(9-10) P(x) = Ox) — sar Tr). 


EXAMPLE 
7. We consider once more the problem of example 6. From the 
recurrence relation we easily find 73(x) = 2x? — 1, hence 


P(x) = x? + ax + Bb — Ξχ — 1) 
=ax+b + 4, 


I 


in accordance with the earlier result. 


Relation (9-8) shows that the error curve for the best approximation of 
a polynomial O(x) = x" +--- by a polynomial of lower degree is given 
by 2~"**7,(x). The discussion in the proof of theorem 9.4 revealed that 
this curve has ἢ + | extrema in [—1, 1] with alternating signs, but all of 
the same absolute value. This property is shared by the polynomial P 
minimizing 

max f(x) — P(x) 
-1Ξ1Ξ} 


where f is any continuous function. The theory of such minimizing 


the interpolating polynomial 197 


polynomials was initiated by Chebyshev (1821-1894); it plays an 
outstanding role in modern numerical computation. 


Problems 
18. Determine the polynomial of degree =n — | that best approximates 
Q(x) = aox” + ayx"-1 +..-+4 a, 


on an arbitrary interval a S x S b, and show that the least value of the 
maximum deviation is given by 


1 /b — a\" 
Ἐπ 7 Ὁ 


[Hint: Reduce the problem to the special case considered above by 

introducing a new variable x* by setting 

_ het ῥ᾽ --α 

ny aoe 2-8 

19. Determine a polynomial of degree <4 that provides the best approximation 

| to the function (x) = x® on the interval [0, 4]. 

20. Approximate f(x) = x? on the interval [0, 1] by a polynomial of degree 2. 

21. Approximate 70.) = χϑ on the interval [0, 1] by a polynomial of degree 1 
by approximating the approximating polynomial of problem 20 by a 
linear polynomial. 

22. Determine directly (by calculus) a linear polynomial P(x) = ax + ὃ 
such that the quantity | | 


x 


hae 


max |x* — ax — 5| 
a + ὦ Ξ σεις 
is minimized. 
23. Prove the uniqueness of the solution (9-10) of the approximation problem 
considered at the beginning of 89.4, | 


Recommended Reading 


A more general treatment of the error of Lagrangian interpolation is 
given in Ostrowski [1960], chapter 1. Fora discussion of the convergence 
of sequences of interpolating polynomials see Hildebrand [1956], pp. 114-- 
118. A first introduction to the theorem of Weierstrass and to the 
approximation of continuous functions by polynomials in general is 


given in Todd [1963], 


Research Problems 


1. Assuming that f is sufficiently differentiable, how well does the 
derivative of an interpolating polynomial approximate the derivative of 
7 ? (For a partial answer, see §12.1.) 

2. By extending the procedure outlined in problem 21, how well can you 
pt apbronmate a polynomial of degree n by one of arbitrary degree 
m <n! 


chapter 10 construction of the interpolating 


polynomial: methods using ordinates 


After considering the more theoretical aspects of the interpolating 
polynomial in chapter 9, we shall now discuss some algorithms for actually 
constructing the polynomial. Many such algorithms have been devised, 
frequently with some special purpose in mind. There are two main 
categories of such algorithms: In the first category, the function f enters 
through its values (or “ ordinates”) at all interpolation points. In the 
second category, f enters through its value at one point, and through 
differences of the function values. Here we are concerned with algorithms 
of the first category. 


10.1 Muller’s MethodT 


For some purposes the interpolating polynomial is calculated most 
conveniently from the Lagrangian formula (9-4). The Lagrangian 
formula is especially convenient if the polynomial is to be subjected to 
algebraic manipulations. As an example of an application of the 
Lagrangian representation, we shall discuss in more detail Muller's 
method for solving the equation /(x) = 0 mentioned in §4.11. 

The reader will recall that the essence of Muller’s method is as follows. 
Assuming that three distinct approximations x,,-2, X,-1, X, to the desired 
solution s are available, we gain a new approximation by interpolating 
the function f at the points x,-2, X,-1, X, by a (normally) quadratic 
polynomial P. Of the (normally) two zeros of P, one closest to x, 15 
selected as the new approximation χα. The process then is continued 
with (x, _—3, Xn» Xn+1) in place of (x,~2, Xn-1, Xn) and terminated as soon as 
IXn+1 — Xnl/|Xn41| becomes less than some preassigned number. 


+ This section may be omitted at first reading. 


198 


—S 


a so 


construction of the interpolating polynomial: using ordinates 199 
The Lagrangian representation of P is 


τ ἫΝ ὙΕ (x -- Keates = Ἄν. αὶ ; (x ri, Be a Keno) 
ab . (Xx, Sit ΧΟ χὰ ἐπ αι ὦ)" is tans oe XyNXn—3 28 τ, ae 


nn (x wie XnMx — An ι) be stp 


ix: ae 8 mol | ee er 


In order to write this in a more compact manner, we introduce the 
quantities 


(10-1) h, = Χη πὸ Xa h=x-—X, 
and obtain 
P(x) = P(x, + A) 


_a+hjath, + hi, =1) _ ha +h, t+ lin-a) ¢ 
πῆς + My—1) = Gee hi ln 1 ee 
Ah + hy) 
" (hy, Ἢ τ, ἧς 115: 


Collecting terms involving like powers of ἢ and writing 


(10-2) qn = 


we find 
P(x) = P(x, + qh) 
= (1 + qn)" (Ang? + Bag + Cn) 


where 


A, = On Fn Τὴ Gn(l + dja- Ὁ Glin αν 
(10-3) B, = (2q, τ 110. “= (1 + 8 ἤν. i ΠΣ ee 
C, = (1 + φρο) ζω. 


Solving the quadratic equation P(x, + φῆ.) = 0, we find 


μαι = Xy i hndn +1 
Where 
ὩΣ, + VB — Δι, 
ἄπει = Tal ere a : 
i 


In order to avoid loss of accuracy due to forming differences, this formula 
Is better written in the form 


2, 


(10-4 n SS ey —————ES——— 
) Gn+1 B, + VB? — 4A,C, 


200 elements of numerical analysis 


Here the sign yielding the smaller value of q,+1, 1.¢., the larger absolute 
value of the denominator, should be chosen. 

It may happen, of course, that the square root in (10-4) becomes 
imaginary. If fis defined for real values only, the algorithm then breaks 
down, and a new start must be made. If fis a polynomial, the possibility 
of imaginary square roots is considered an advantage, since this will 
automatically lead to approximations to complex zeros. 

Three starting values Xo, X;, Xe, for the algorithm have to be provided 
from some other source. Muller recommends to start the algorithm by 
taking for P the Taylor polynomial of degree 2 of fatx = 0. If fisa 
polynomial, 


f(x) iar gx” ὩΣ Bx" * ee tay 
this can be achieved artificially by putting x5 = —1, x, = 1, χα = 0, thus 
(10-5) fy =2, he=—-l, ge = -Ξ 


and setting 
fo = ἂν — Gy-1 + Gy-2; 
(10-6) fi = ἂν + Gy-1 + Gy-2, 
fo = ay. 


As soon as a zero of f has been determined, it is to be divided out by 
algorithm 3.4 in connection with theorem 4.10. 

It follows from the work of Ostrowski ([1960], p. 86, although without 
reference to Muller’s work) that Muller’s method converges whenever the 
three initial approximations are sufficiently close to a simple zero of /. 
The degree of convergence lies somewhere between that of the regula falsi 
and of Newton’s method. No convergence theorems in the large similar 
to those for the OD algorithm appear to be known. Nevertheless, the 
method is (in the United States) among the most popular for finding zeros 
of polynomials. 


Problems 
1. Use Muller’s method to find all zeros of the polynomial 
p(x) = 128x* — 256x? + 160x? — 32x + 1. 


(Real arithmetic may be used here.) 
2. Use complex arithmetic to determine all zeros of the polynomial 


p(x) = x* — 8x® + 39x? — 62% + 51 


by Muller’s method. 


construction of the interpolating polynomial: using ordinates 201 


10.2. The Lagrangian Representation for Equidistant Abscissast 


In the present section we assume that the points x,, where the values of 
the function fare given, are equally spaced. This is the case, for instance, 
for most mathematical tables. If 4 denotes the distance between two 
consecutive interpolating points, we then have 


(10-7) Xp = Xo + kh. 

where k = Ὁ, +1, +1, +2,.... We now introduce a new variable s by 
means of the relation 

(10-8) X = Xo + SA. 


| At x = x,, s obviously has the value kK. The variable s thus measures x 


in units of /, starting at Xp. 

We now consider the polynomial P of degree ἢ — m which interpolates 
the function f at the points Δι» %msis..-+>Xn» Here mt and ἢ may be any 
two integers such that 2 m. (Ordinarily, we have m Ξ 0,n 2 0.) By 
(9-4), this polynomial is given by 


PQ) = > πολ, 


where 


We now express P in terms of the variable s defined by (10-8). Evidently 
Χ τι X, = (s — gh, 

and in particular, 
χὰ — Xq = (k — qg)h. 


If P(x) = Ρίχο + sh) = p(s), we thus have 


(10-9) p(s) = > hf 


where 


ae 
ἰ,(5) ἘΠῚ [Iz a 5 
“Ὡς 


The remarkable fact about this representation of the Lagrangian 
polynomial is the independence of the functions /,({s) from A. These 


functions, which may be called the normalized Lagrangian interpolation 
coefficients, depend only on s (the relative location of x with respect to x 


} This section may be omitted at first reading. 


202 elements of numerical analysis 
and x,), and, of course, on the integers m and n, which define the set of 
interpolating points. 


EXAMPLES 
1, wm=Qn=1. We have 


q=0 εἰ 
g#0 
+s —g 5 
06) - ΠῚ = - 
ι(5) 141-4 1.- ῇ 
q#i 


We get, of course, the formula for linear interpolation, expressed as 


P(s) = (1 — s)fo + 56. 
2. A case which is frequently used in practice is given by m= —1, 
n= 2, Here we find 


1 = TIAL = CH DE =D _ «6-ὴ6 -- 2) 


anna στῇ (—1)(—2)(—3) — 6 
jo OE 2) τι 65 
ee : - 5 
,(s) = Sie Se Spa 
κ|0}}5 oe = 1_,(1 -- 5). 


In view of the fact that the normalized Lagrangian interpolation 
coefficients depend on one continuous variable only, extensive tables for 
them have been prepared (National Bureau of Standards [1948]). Such 
tables take into account symmetry properties such as the relation 
ἰν(5) = (1 — 5) noted above. 

We note some interesting algebraic relations between the normalized 
interpolation coefficients. Let us consider the general case, where the 
interpolating points are λιν Xn4is--->X_- The polynomial 


P(x) = ~ LAX) fics 


using ἢ — m+ 1 points, will furnish an exact representation of the 
function f if f is a polynomial of degree ἢ — m or less. Thus it will be 
exact, in particular, for the functions 


S(x) = (3:3): ἢ τοι ἢ, hy cst = 


construction of the interpolating polynomial: using ordinates 203 
Since f(x) = σ΄, we have f, = k*. Hence (10-9) yields the identities in s, 


(10-10) » Ak 19, gH 0,1)... nv. 

k=m 
These identities can be regarded as a system of ἢ — m + 1 equations for 
nu — m+ 1 unknowns /,(s). They thus could be used to calculate the /, 
numerically. 


EXAMPLE 
3. For m = 0,n = 2 the relations (10-10) take the form 
ἰν(5) + Ls) + Lo(s) l 
ΟΝ +h6)2 =s 
Lil? + Lig? = 2. 


If fis not a polynomial of degree Sn — m, the error formula (9-5) still 
stands. In the present situation, 


i 


E(x) L] Le — χῷ 


k=om 


and thus 
isi ἘΝ Hite ii! (s — k). 


k=m 
Thus the error formula now appears in the form 
a ied tee 
(10-1 1) I (Xo + sh) om p(s) bad gh μὴ. (n—m + 1)! I] (s = k), 
where &, is a point between the largest and the smallest of the numbers 
Xm; Xn X- 
EXAMPLE 
4. If linear interpolation is used as in example 1, we have for0 <= s < 1 


P(s) — f(Xo + sh) = jolts s(1 — s), 


where Xo = fe = X41. 


Problems 


3. Use normalized Lagrangian interpolation coefficients to determine 
J,(2.4068) by interpolation from the following values: 


x Jo(x) 
2.1 0.16661 
2.3 0.05554 
2.5 — 0.04838 
2.7 — 0.14245 


SS --- — = 


204 elements of numerical analysis 


4. If the interpolating points are X,,, Xm+1,..., Xn, prove that 
Ls) ΞΞ lntm—KU + ΕΣ 5), = fi, ti + λιν Bh 


5. Make a general statement about the signs of the /,(s) as a function of 
m,n, k, and s. | 
6. If the interpolating points are xo, X1,..., Xn, Show that 


aa Ky ἢ n+l = / 
ifs) = {-- Π) (eee Ae A νὰ νυνν ὩΣ Sse 


7. Assuming that the interpolating points are X,, X%m+1,..., Xn, find a closed 
expression for the sum 


> κι- ™+1p (5), 
kom 


(Hint: Apply the error formula (10-11) to the function 
f(x) = (x — Xo)? 11] 


10.3 Aitken’s Lemma 


We now shall discuss certain algorithms that permit us to construct the 
interpolating polynomial recursively, without reference to the Lagrangian 
formula (9-4). The basic tool is a lemma which enables us to represent an 
interpolating polynomial of degree d + | in terms of two such polynomials 
of degree d. 

Some special notation will be required. We again denote the points 
at which the function fis to be interpolated by Xo, Δ» Xe,..-; Χμ» and by 
Κα the value of fat x,. We shall have to consider polynomials that inter- 
polate f at some, but not all of the points Xo,.%1,...,X,. If S is any 
nonempty subset of {x9, x,,..., Xn}, we denote by Ps the polynomial 
interpolating f at those x which are in S. Thus, if S contains k + | 
points, Ps is the unique polynomial of degree = & such that 

PAs) = fi x, ES. 
EXAMPLES 


5. If S contains just one point x,, then Ps = ἢ. 
6. Pix, , 29,25) denotes the polynomial interpolating at the points x}, X2, Xs: 


Denoting by W the set of all interpolating points, we can state the 
following lemma: 


Lemma 10.3. Let S and T be two proper subsets of W having all 
but the two points x,¢.S and x,¢Tin common. Then 


(10-12) Psur(X) = 


identically in x. 


Here, as usual, S U T denotes the union of the sets S and 7. 


construction of the interpolating polynomial: using ordinates 205 


Proof. Let the sets S and T contain m+ 1 points each. Both poly- 
nomials Ps and P, interpolate at m + 1 points, hence are of degree < m. 
Denoting the expression on the right of (10-12) by P, we see that P has a 
degree Ξ m+ 1. Hence if we can show that P interpolates at all points 
of S UT, then theorem 9.1 implies that P = Ps. 
Let x;, be a point of the intersection SM Tof Sand 7. By virtue of 
P5(x;,) = Pr(x.) = Sis 
(10-12) yields 
Pin) = (χ, — XeMx τ (Xs — ude 


= fies 
as desired. -For x = x, we have 


pete b sixty Xi) Ps(Xj) At 
P(Xx;) -- ie il Xs = fi, 
and similarly for x = x; 
, (x; — x;)Pr(x;) Μῇ 
P(x;) = Shea δ = fy. 


Thus P has been shown to interpolate at all points in S U Τὶ completing 
the proof. 


EXAMPLE 


fee it S = {xp, Xo}, Τ = {xo, Xs}, we obtain 


Pes a alo = (Xo — X)Pix9, αεχ) — (χε τε X)P (xo, κα} Χ). 

XO — 5 
8. Lemma 10.3 is already familiar if the intersection ST is empty. 
We then have, using example 5, 


es X)P ¢a,\(X) = (x, — X)P px, (Xx) 


Ῥω, 054%) Te x; ae τ 
ὦ Ξ OG —  — DA 
x —= Ay 


This is the familiar formula for /inear interpolation. 


Lemma 10.3 can be used in two ways. We may use it to get a formal 
representation for the interpolating polynomial, or we may use it to 
Calculate the value of the polynomial for a given value of x. In the latter 
Case formula (10-12) requires dividing a sum of products by a single 
number, an operation that can be performed on a desk computer without 
Writing down intermediate results. 


| 
. 
| 
| 


206 elements of numerical analysis 
Problems 


8. Obtain the Lagrangian formula for quadratic interpolation on the set 
{x9, X1, X2} from the formulas for linear interpolation on the sets {Xo, Χα!ὶ 
and {Xxo, Xz}. 

9. Prove the following generalization of lemma 10.3: Let S be an arbitrary 
subset of {Xmiis Xm+2s++-, Xn} and let δὲ = {x%,, 5}, Καὶ = 1,2,...,m. If 
L,(x) (k = 1, 2,..., m) are the Lagrangian interpolating coefficients for 
interpolation on the set {x,, ¥2,..-., Xm}, then 


Pys,(x) = δ Le2)Ps, (2). 


ke=1 


10.4 Aitken’s Algorithm 


Lemma 10.3 enables us to generate the interpolating polynomials of 
higher degrees successively from polynomials of lower degrees. It still 
leaves us considerable freedom in the choice of the sets S and T used to 
finally obtain the polynomial Py. Two standardized choices have become 
widely used, one named after Aitken, the other named after Neville. In 
both choices a triangular array of polynomials P,.. is generated. Here 
P,, q is a certain polynomial of degree d that interpolates on a set of ὦ + 1 
points depending on k. Aitken’s scheme is as follows: 


Algorithm 10.4 Ford = 0,1,...,, generate the polynomials P;.,¢ 
as follows: 


(10-13) Px, o(X) = fis k= 0, 1, vey ἢ 


(x), — x)Pa,a(x) — Wa — X)P aX) 
Xy — κα 


k=d+ Ild+2,...,M. 


(10-14) Ῥ., «εἰὐἰχ) ΞΞ 


The arrangement of the polynomials P,,, is shown in scheme 10.4. 


d 0 ] 2 sae n 

peda κοισθδ τ΄ πο ας πᾷ hE 
yo) Beg i ἡ 
Xy P,0 νι χὰ — ἃ 
Xg Poo Poy P22 Ng — #4 
Xi Piso Pics Pis.2 Mie θς 
Xn aia, Py Pig eer Pin Xn ade 


Scheme 10.4 


construction of the interpolating polynomial: using ordinates 207 


The doubly underlined entry in scheme 10.4 is obtained by crosswise 
multiplication of the simply underlined entries. 


Theorem 10.4 In the notation of lemma 10.3, 


Fug ΞΞ Pence ake a ve 


ea Us πῆ ἢ madd + doo ecm, 


Proof. We use induction with respect to d. By (10-13), the assertion is 
true for d= 0. Assuming it to be true for some d = 0, lemma 10.3 
shows that the polynomial defined by (10-14) interpolates on the union of 
the sets {Xo, X1,..., Xg-1, Xa} and ἔχῃ» X,,..., Xg-1, X,} that is, on the set 
{Xo, X1,-.-; Xa, X,} proving our assertion for d increased by one. For 


 d=k =n we obtain 


Corollary 10.4 P, ,, = Py. 
The rightmost entry in scheme 10.4 is the polynomial that interpolates 
on the set of al/ points Xo, X1,.. +5 Xn: 
EXAMPLE 


9, Let f(x) = x*. We wish to calculate /(3) by interpolation at the 
points —4, —2,0,2,4. Scheme 10.4 looks as follows: 


Xi; | Pio Pry Pi..2 Pre. Pra Xi ὰ 
—4 256 =] 
—2 16 — 584 —5 

0 0 — 192 396 —3 

2 16 — 24 116 — 24 —| 

4 256 256 116 186 8] 1 
Problems 


10. Use Aitken’s algorithm to obtain a value of sin 7/4 from the following 
values of the function f(x) = sin x7/2: 


x -2 -- Ἰ 0 1 2 3 
I(x) 0 --Ἰ 0 1 0 -Ἰ 


ΤΙ. Use algorithm 10.4 to determine Jo(2.4068) by interpolation from the 
values given in problem 3. 


10.5 Nevyille’s Algorithm 


In Neville’s use of lemma 10.3, the polynomials P,, , are built up in such 
ἃ manner that each polynomial interpolates on a set of points with d + 1 
Consecutive indices. The algorithm is as follows: 


208 elements of numerical analysis 


Algorithm 10.5 For d=0,1,...,”, construct the polynomials 
P,..q as follows: 


(10-15) Rp, “RS OLS: 
(x; — X)Px-1.a(X) — (e-a-1 — X)P re, aX) 


(10-16) = Px.a+i(x) = 


Xe — Xe-d-1 


k=d+ld+ 2.....%. 


The arrangement of the polynomials P,,,q is the same as In scheme 10.4, 
but the doubly underlined entry is now computed from asymmetrically 
located entries as shown in scheme 10.5: 


d 0 ] 2 ἊΝ n a 
7 | ’ χη — A 
=“ Q) : 
Xo Poo oT 
Ny me 
XY Py, Ρι Δ : 
—— 
Xg Po, 0 Poy Po, 2 
) Χμ τι χ 
ἀκα P,~2,0 Py 2,1 Py 9,2 k 
χε. x 
Mig -1 P-1,0 Pea 4 Pi. 1,2 ke 
| | ΒΉ "ἢ" 
Nk Pr,0 Pra Pro 
' ve | Χ, -- ἃ 
Xn Pat Poi Pre roe fh 
Scheme 10.5 


Theorem 10.5 In the notation of lemma 10.3, if the polynomials 
P,., « are generated by algorithm 10.5, 


Pra ae Pie a ena beeen er? 


eS Ἢ ae eS cal 


Proof. By (10-15), the assertion is true ἴοι a= 0. 1 oo 
d= 0, then it follows from (10-16) by virtue of lemma 10.3 that 1 ae 
interpolates on the union of the sets {X,~-¢-1. Χκ-- ἀν. 5» ed ἊΝ 
ἔν τὰς Kewanee οτος Map ee OP the set {x,-g-15 Xe-a> ++ +> Χκὴν Proving 
assertion with d increased by one. 

For d = k = n we have, in particular 


Corollary 10.5 P,,., = Pw: 


thus again, the rightmost polynomial in Neville’s scheme is the desired 
polynomial interpolating at all points Xo, X1,-++5 Xn 


construction of the interpolating polynomial: using ordinates 209 
EXAMPLE 


10. We again consider f(x) = x* and calculate (3) from the values at 


x = —4, —2, 0, 2, 4, using Neville’s algorithm. The following scheme 
results: 


Nx Px 9 Pia Po Pr. 3 Pia Ay — 
— 4 256 


—7 

—2 16 — 584 — > 
0 0 — 24 396 = 

2 16 24 36 —24 oy 
4 256 136 108 96 δ] ] 


Problems 


12. Which entries in the schemes of the polynomials P;,,¢ generated by the 
algorithms 10.4 and 10.5 are necessarily identical ? 


13. Find an approximate value of 2 by interpolation, using Neville’s 


algorithm, from the values of the function f(x) = 2“ at the points x = —?, 
—i1, 6, 1, 2. 3. 
14, Calculate an approximate value of the infinite series 
l | 1 
tt get ae tae +: 
in the following manner: Let 
Aa |e diac dounggag) gegen 
ni pei = sf Ste ον τ ἐν 


and calculate /(0) by extrapolation from /(1), fG), fQ),..., using Neville’s 
algorithm. 


10.6 Inverse Interpolation 


Interpolation (approximately) solves the problem of finding the value of 
y = f(x) when x is given. It does not solve the problem of finding x 
when y = f(x) is given. (We could, of course, replace f by the inter- 
polating polynomial P and solve the equation y = P(x) for x, but in doing 
so we would merely replace one problem by another problem of com- 
parable difficulty.) The problem can be easily solved, however, by 
interchanging the roles of x and y. Speaking abstractly, this amounts to 
interpolating the inverse function f'-" instead of f itself. Speaking 
concretely, it means interchanging the roles of the x, and the f,. Since, 
even for equidistant x,, the corresponding values J; are not equidistant, 
it is essential that we are able to calculate the interpolating polynomial for 
nonequidistant interpolating points. As an example, we consider the 


210 elements of numerical analysis 


problem of solving f(x) = 0 when the function f is known at n + 1 
distinct points x,. If the polynomial P(y) interpolating the inverse 
function at the points /(x,,) = /,18 constructed by Aitken’s algorithm and 
evaluated at y = 0, we obtain the following algorithm: 
Algorithm 10.6 Let 
Kg ἘΞ Xin (m = 0, 1,..., 7”) 
and form for n = 0, 1,...,m — 1 the numbers " 


Tian —SaXmen. 
ne Jn 


The approximate solution is given by Y,,,.. The arrangement of 
the triangular array of the values Y,,, 1s as follows: 


to Xo,0 
ae ae 
Se Xo,0 X2,1 X2,2 


Ryne τ΄ 


Fn Χο Xn,1 Xn,2 ἡ 2 ee 


Inverse interpolation is possible only if, in the range where interpolation 
is used, x is a single-valued function of ». In the example depicted in 
figure 10.6, where this condition is not satisfied, the interpolating poly- 
nomial bears no relationship to the inverse function. 

The error of inverse interpolation obeys the same laws as the error of 
ordinary interpolation. It depends on the derivatives of the inverse 
function f'~(y). These derivatives can be calculated, in principle at 
least, from the derivatives of the function f. Differentiating the identity 


fo NG(x)) = x 


we obtain 
(10-17) FGI) = 1; 
hence 

fF) = τίς 


Higher derivatives can be obtained by repeatingly differentiating (10-17). 
For instance, 


SYFODU COP τ ΟΣ) = 0 
shows that 


--γ1ην ΠΝ τ 1 (ΑἹ) 
j' "Ὁ Ὁ} = ΓΟ)" 


construction of the interpolating polynomial: using ordinates 211 


Figure 10.6 


This process can be continued, but the results become more and more 
complicated, even if the derivatives of fare simple. 
Problem 


15. Using inverse interpolation, find an approximate value of the second zero 
of the Bessel function Jo(x) if the following values are given: 


x J (Xx) 
3,2 —0.1102904 
5.4 —0.0412101 
5.6 0.0269709 
5.8 0.0917026 


10.7 Iterated Inverse Interpolation 


Solving an equation f(x) = 0 by simple inverse interpolation as 
described above is appropriate if the function f is known only at a 
set of discrete points x (e.g., if fis a tabulated function). If f(x) can be 
calculated for arbitrary x, it is possible to test the result of the inverse 
interpolation procedure by evaluating f at the interpolated value of x. 
In general, f(x) will not be exactly equal to 0. In this case, f(x) and x are 
introduced as new entries in the interpolation table, and a new row of 


Hi) 


212 elements of numerical analysis 


values of X is calculated by inverse interpolation. The Neville form of 
the interpolation table is especially appropriate here, because in it already 
the first entries in a new horizontal row are good approximations to the 
desired value of x. Beginning with two values of f, the Neville scheme 15 
continued systematically row by row as follows: 


Algorithm 10.7 Choose x, and x,, and let 


to = ζω, Xo,0 = Χο. 
(10-18) Fv =fQ0), = X10 = 
PND Oa 


Saas | 


Then form the triangular array of numbers X,,,, (m7 = 2, 3,...; 
ἢ = 0,1,...,m) by means of the relations 


Xm, 0 ars Aegan ts Sn = f(Xm,o): 


| ; ἐπε SinXm-i.0 — fn—n-1%m,n 
ene rene ¥ Fin πὰ πῶ 


If f is sufficiently differentiable in a neighborhood of a solution s of 
f(x) = 0, if f(x) τὶ 0, and if x» and x, are sufficiently close to s, it 18 
intuitively clear that the numbers X,, , converge to s for n —> οὐ. 

As described above, the table of the values X,,,, extends farther and 
farther to the right with every new row. Ultimately, little extra accuracy 
will be gained from the entries far to the right, because they depend on 
values X;, 9 with small ἢ which presumably are poor approximations to the 
desired solution. It is therefore advisable not to increase the degrees of 
the inverse interpolating polynomial beyond a certain degree d (say 
d = 2 or d = 3), that is, to truncate the table of values Δί, after the dth 
column. This means that the formulas (10-19) are only used for m s d, 
for m > d they are to be replaced by the following: 


Xm, τ Meee mn 3 ee” 


_ SnXm-1.n ~ m-n-1Xmn, n=0,1,...,d-—1. 


(10-20) 
i ‘cilia dn henna 


The convergence of this modified version of algorithm 10.7 is, under 
suitable conditions, proved by Ostrowski ({1960], chapter 13). 

The case d = 2 of the modified algorithm 10.7 is very similar to Muller’s 
method discussed in §10.1, except that now the inverse function rather 


construction of the interpolating polynomial: using ordinates 213 


than the function itself is interpolated by a quadratic polynomial. From 
the computational point of view it would even seem to be superior to 
Muller’s method since it does not require the evaluation of square roots. 
However, just for this reason it lacks the advantage of automatically 
branching off into the complex domain if no real zeros are found. 


Problems 


16. Using repeated quadratic inverse interpolation, find the root of the 
polynomial 


P(x) = 70x* — 140x° + 90x? — 20x + 1 
located between 0.6 and 0.7. (Use Horner’s scheme to evaluate the 
polynomial.) 
17. Show that iterated /inear inverse interpolation is identical with the regula 
falsi (see §4.11). 


Recommended Reading 


The practical aspects of interpolation are dealt with in a volume issued 
by the Nautical Almanac Service [1956]. The theory of a number of 
processes for solving f(x) = Ὁ by methods based on interpolation is dealt 
with very thoroughly by Ostrowski [1960]. 


Research Problems 


1. How can the regula falsi be extended to the solution of systems of 
more than one equation? (For some pertinent remarks, see Ostrowski 
[1960], p. 146.) 

2. Develop a theory for interpolating functions of two variables by 
bilinear polynomials of the form ἃ + bx + cy + dxy. 


chapter | ] construction of the interpolating 


polynomial: methods using differences 


The representations of the interpolating polynomial discussed in chapter 10 
were based directly on the values of the interpolated function. They do 
not convey any information, explicit or implicit, concerning the error of 
the interpolating polynomial. In this respect, the methods to be discussed 
in the present chapter do somewhat better. These representations are 
based on differences of the sequence of function values, and on certain 
properties of binomial coefficients. 


11.1 Differences and Binomial Coefficients 


Differences of a sequence of numbers were already defined in §4.4 and 
§6.9. We now introduce differences of a function f defined on a suitable 
interval. Let h > 0 be a constant. The function 4f whose value at x is 
given by 

Δα) = f(x + ἢ — fF) 


is called the first (forward) difference of the function Κα It obviously 
depends on the step A, although this fact is usually not made evident in the 
notation. Higher differences are defined inductively by the relation 

a Atl *F), A a καὶ 
For instance, 


A%f(x) = fle + 2h) — 2α Ὁ 1) + 70). 


For symmetry we put 


Δ} =f. 


214 


construction of the interpolating polynomial: using differences 215 


An induction argument entirely analogous to that employed in the proof 
of (6-27) shows that 


Arg(x) = fx + kh) -- [λα + & — Dp 


k | 
+ (5) Atk -ὐὴ Foe ἐξ θυζώ. 
If, for integral k, x; = x9 + kh, and if we write 


In) = Scr 


the differences thus introduced produce the same result as the difference 
operator 4 introduced in §4.4, if the latter is applied to the sequence of 
values {/;,.} of f. 

It is to be expected that the differences of a function share many 
properties, and have many connections, with the derivatives of the func- 
tion. For instance, the mean value theorem of differential calculus states 
that, for some ἔ between x and x + A, 


Af(x) = hf'(é). 


We shall soon become acquainted with a generalization of this relation to 
differences and derivatives of arbitrary order. 

In differential calculus, a set of functions enjoying particularly simple 
properties with respect to differentiation is the set of monomials x"/n!, 
e—O0,1,.... In fect, 


a) (i) - w= 


In difference calculus, an analogous role is played by the binomial 
coefficients 


ot) (ἡ ~ ee) nes 


A n! 


Here s is any real (or even complex) number, and ἢ is a positive integer. 
For ἢ = 0, the symbol (11-2) is defined to be 1, for negative integers ἡ, 
zero. It is always understood in the following that the operator 4 acts on 
the variable 5, and that the step A = lisimplied. With this understanding 
we have 


(11-3) 4(Ὁ = 4 vf i): 


216 elements of numerical analysis 


This is trivially true for n = 0; forn > Owe have 


aC )κἢ 


n! 


ΘΕ πιο 2 + 2) 


πίη — 1)! 
ΕΣ 
iS 


as desired. By induction it follows immediately from (11-3) that 


‘ pf.” .]. = 0, 1,2 
(11-4) A [ἢ ὦ ΤΟ oe 

Another property of the monomials x"/n! also carries over to the 
binomial coefficients. It is trivial that every polynomial of degree n can 
be expressed as a linear combination of the monomials 


Similarly, such a polynomial can also be expressed as a linear combination 


' (OC) 


In fact, a somewhat more general statement is true. If{a@,,a@.,..., a,}1s an 
arbitrary set of real numbers, then a polynomial of degree nv is expressible 
in terms of the generalized monomials 


Xba, H+ ae)? (πὶ ἄρ 


1, τ ΤΊ 51 a 


This fact has the following analog: 


Theorem 11.1 Let a,, a>,..., @, be m arbitrary real numbers, and let 
p be any polynomial of degree n. Then there exist constants Ao, 
A;,..., A, such that 


7 5. Ἔ ἂς 


identically in s. 


Proof. Evidently the statement of the theorem is true for nm = 0; we 


a 


construction of the interpolating polynomial: using differences 217 
proceed by induction with respect to n and assume that the theorem is 
true for some nonnegative integer n — 1. If 

p(s) = bos” + bys"7* +++ + by, 
then the polynomial 
(11-6) a(s) = pls) -- bon'(* Ὁ“) 


is of degree n — 1, since the leading coefficient of 


Se oaks a κὁὉ 
| 18 iy 
n n! 


By the induction hypothesis, g can be represented in the form 

q(s) = Ao + αὐ ᾿ ᾿. sr» ἃς. © a 

Ϊ n- 1 

Solving (11-6) for p(s) we obtain a representation of the desired form 
(11-5), where 4, = bon! 
EXAMPLES 
1. A special case of theorem 11.1 was used in example 13 of chapter 6, 
where we obtained the formula 


v= (9+) () 


ἃ 2. The truth of corollary 6.8 (which was not proved in §6.8) follows from 


the special case a, = 5 =--- = a, = 0 of the above theorem 11.1. 


Problems 


1. Determine all functions f that are defined on the whole real line and satisfy 


Af(x) = hef(x) 
identically in x, where c is a constant. 

2. If f and g are two differentiable functions, find a formula for 4(fg) and 
derive the product rule for differentiation. 

3. Formulate an algorithm for obtaining the representation (11-5) for a 
given polynomial p(s) = dos" +---+ 6, and for given constants 
1. Ao,...,@. (Determine 4, first.) 

4. Represent the polynomials p(s) = s" (n = 1,2,...) in the form (11-5), 
where a, = 1,2..... 

5. Find a closed expression for the differences 


Af, fo a, νον ess 
if f(x) = οἴ and x» = 0. Show that in this case 
lim fae A, =F (0). 


218 elements of numerical analysis 
11.2 Finalized Representations of Sequences of Interpolating Polynomials 


We now return to the problem of constructing the polynomials inter- 
polating a function f on a given set of points. The interpolating points 
are a set of equidistant points 

Nie a +0 ἨΣ kh, 
where the integer k may be positive or negative. As usual we write 


Sic = (Xx). 


It will be convenient to express the interpolating polynomial in terms of 
the variable 

(11-7) ἢ τα τ 

Thus, if S is a set of interpolating points x,, and P, denotes the polynomial 
interpolating f at the points of S, we shall define p by 


P(s) = P(x) = Ps(Xo + Sh). 
The polynomial p is characterized by the property that 
p(k) =f, Whenever x,€S; © 


for nonintegral values of s, p(s) is to be regarded as an approximation to 
f (Xo sy sh). 

Actually, we are now not merely interested in constructing a single 
polynomial, interpolating f on a single set S. Instead, we shall try to 
determine sequences of interpolating polynomials that interpolate f on a 
sequence of sets Sp, S;, So,... of interpolating points. These sets are 
defined as follows: 

Let "10. #1,, %,... be a nonincreasing sequence of integers such that 


for albk =>’ 0, 1, 2)... and let 


Si, ται ΤῊΣ Ning +1) c= are Xmy ths 


k=0,1, a. ... The set δὶ thus contains precisely & + 1 consecutive 
interpolating points, beginning with the point x,,.. By virtue of (11-8), 
each set δὲ, contains the preceding set δ... and thus a// preceding sets 
Si-29++ εν Soe 


EXAMPLES 
ἃ, beam, = 0,8 = 6, i235 Au Pee 


Sic = ἔχον X1y ++ +> Xx} 


construction of the interpolating polynomial: using differences 219 
4, Ifm, = —k, k = 0,1,2,..., we have 
ite ΞΞ Gy Magee ges ay gts 
5. Setting mp = m, = 0, "1. = mg = —1,..., we obtain the sets 


Sox = toh Mobis ees Xs 
Sergs ΞΞ ἰχορ Mowery ss ον ἀκα) 


aaa «Oe 
6. Setting mm) = 0,m, = mz = —1, m3 = m, = —2,..., the sets δ. are 
given by 
Soy, = iti Wm piss? +5 Mass 
Songs ΞΞ ἰχΧ..κ..1. Modes as χε). 


By the fundamental theorem 9.1, there exists, for every k = 0, 1, 2,..., 
a unique polynomial Ps, of degree =k such that 
Ps,(Xm) = fis 
If x and s are connected by (11-7), we shall write 
Pils) = Ps, (Xx). 


We now wish to consider the following problem: Given a sequence of 
integers {m,,} satisfying (11-8) (and hence a sequence of sets So, S;,...), 
determine two sequences of réal numbers {a,,} and {A,,} such that, for every 
3 ee 


(11-9) Pr(S) = Ao + απ > *) ren An’ 7 **). 


Xm Ε Sp 


identically in s. 

It is not clear at all that this problem has a solution. Theorem 11.1 
merely tells us that, for an arbitrary sequence {a,} and for every fixed 
integer ἢ, constants 40, A,,..., A, can be found such that (11-9) holds. 
It is to be expected, however, that if ” is replaced by m + 1, the constants 
Ao, A;,..., A, already found will have to be replaced by other constants. 
If we wish to obtain a “finalized” representation of the sequence {p,} 
with the property that the constants A,, once determined, remain un- 
changed, we can hope to do so only by a judicious choice of the sequence 
{ay}. 

EXAMPLE 

7. Let a, =n, n=1,2,.... The function f(s) = s is interpolated at 
s=0 by po(s) = 0. This is a representation of the form (11-9) with 
A, =0. In order to interpolate f at s = 0 and s = 1, we must take 


Pils) = s = —1 + ie ') 


220 elements of numerical analysis 


The last expression is again of the form (11-9), but we now have 4) = —l. 
The coefficient A, = 0 is preserved if and only if we choose a, = —0. 


Let us now investigate the properties which the sequence {a,,} must have 
if the above problem is to have a solution. For an arbitrary integer 
n = 0, consider the two polynomials 


| es +d, 
Pils) = Ap + A,’ Ὁ Ἢ ΕΣ ἘΔ A,( 


] i 
and : 
| s+ a 5 + ay S + Anii)\ 
Pn+145) = Ap τ A,( i ἡ rte An " ) τ τ Ansa n+ 1 


The polynomial p,, interpolates on the set S,,, Pz+1 on the set S, 41. Since 
S, is contained in S,,1, both polynomials interpolate on the set Sys 
Both thus have identical values for x, in the set S,. This means that the 


last term 

ἐπεὶ i 
must vanish whenever s is equal to one of the integers 
(11-10) My, My Ἔν νος My + ἢ. 


If fis such that p,,, has degree ἡ + 1, then A,,, # 0, and the required 
condition is 


(L11) (° + ἢ 9 


for all said integers. The binomial coefficient in (11-11) is zero if and only 
if s is one of the numbers 


—An+is στ ἄπει να by. vey ~4n 41 Ἔ ἢ. 


Evidently the set of these numbers coincides with the set (11-10) if and 
only if 


(11-12) lier Saag! ee tae on, 


This condition fully determines the sequence {a,, @2,...} as a function of 
the sequence {7,}. 

There remains the problem of determining the constants A,. For a 
fixed value of n, there certainly exist, by theorem 11.1, constants Ao: 
A,,..., A, such that 


i — fit 5S - My-1 
pals) = 46 + Αἱ 1) τοστ Al : ) 


construction of the interpolating polynomial: using differences 221 


The question is whether these A, are independent of n. In order to 
determine A, for Ὁ S k <n, we form the kth difference of p,(s). By 
(11-4), we get 


(11-13) A¥p9(6) = Ae + ἀνα ἢ το τ 4,(° 51} 


In this identity we set s = m,. The values of p involved in forming the 
difference 4“p,(s) then are the values p(s) for s = m;,, m, + 1,...,m, + k, 
1.€., those values of s for which ας Ε δι. Since S,c S,, we have 


PAS) = Ss 
for these values, and hence 
A", (my) = ΔΓ κι. 


In the expression on the right of (11-13), all binomial coefficients are zero 
for s = m,, as it follows from (11-8) that 


0 Ξ », — m., 3 1, Poem es 2 aa 
We thus obtain 
Ay, = Pes 


and it turns out that 4, is independent of ἡ, the degree of the interpolating 
polynomial, as we had hoped. The polynomials p, solving the problem 
posed initially thus are given by 


(11-14) PS) = Σ ἀν ἀν * si 


They can be generated recursively by the following simple algorithm: 


Algorithm 11.2 If {m,,} is a sequence of integers satisfying (11-8), 
let pols) = fn,» and for n = 0, 1, 2,..., 


11-15 )= nel Shak: 
( ) Pu+ilS) = p,(s) + 4 ἌΝ σι Ν 


By construction, we have 


Prlk) = F(X); 


k= m,, mM, + 1,...,m, +n. We wish to find an expression for the 
error of p,(s) if s is not equal to one of the above values of k. This is 


easily possible in our new notation. According to theorem 9.2 the 
difference 


I(x) — ᾿ς, (ΑἹ = [0) — prls) 


222 elements of numerical analysis 


can, if f has a continuous derivative of order n + 1, be written in the 
form 
α — Xing )( — mg ει). +O ττ Fant) κατε}, 
(n + 1)! 
where €, is some point in the smallest interval containing x, X»,, and 
Xm,+n We have 


X — Xmen = Als — Gm, + *)I, k= 0, | ee ἮΣ 


The product of the factors X — Xn, +k appearing above thus can be 
written 
| 5s — My, 
a+1 Fr ! : 
h m+ "0 ) 
and we obtain 
Theorem 11.2 Let the function f have a continuous (ἢ + 1751 
derivative on an interval containing the points x = Xo + Sh, Χρι,» and 
Xm.an If pas) is defined by algorithm 11.2, then for a suitable 
point €, of that interval 


My 


(11-16) Μὼ πριν + PE an ) 


A comparison of the equations (11-15) and (11-16) shows that the 
correction term which has to be added to p,(s) in order to obtain the exact 
value of fis of the same form as the term that has to be added in order to 
pass from p, tO Pasi, With the exception that 


An*1f,, ,, is to be replaced by πώ ἀν lege Ὁ Ὁ 


If the function f*” does not change very rapidly, these two terms are of 
the same order of magnitude (see the problems 1] and 12). One can thus 
say with some justification that the error of p,, is of the order of the first 
omitted term in the sum (11-14). Thus, if the degree of the interpolating 
polynomial is not fixed beforehand, one may hope to obtain an accurate 
representation of f (to within rounding errors) by extending the sum 
(11-14) through such a value of n that the omitted terms are insignificant. 


EXAMPLES . 

We shall construct the sequences of interpolating polynomials corre- 
sponding to the sequences {m,} considered in the examples 3, 4. 5, and 6. 
8. Form, = 0,n =0,1,2,... we obtain the polynomials 


pls) = fo + (7) Afy + (5) A%fy +--+ (*) A"fy, 


interpolating on the sets {Xo, Χαν «τ.» χα. 


construction of the interpolating polynomial: using differences 223 


9, Form, = —n,n = 0, 1,2,... we obtain 
Puls) =fo + [ἢ fat (ἢ 45.......... [Ὁ τ ay, 


These polynomials p, interpolate on the set {x 9, χ..., X_2,..., ΧΩ. 

The formulas obtained in examples 8 and 9 are known, respectively, as 
the Newton forward and the Newton backward formula. 
10. Letting 


{7M9, 71... Mo,...} = 10,0, —1, —1,...}, 


we get the polynomials 
pals) = fo + (1) Afo + (5) 417. 


OS ar (Gare 


(n + 1 terms) 
interpolating on the sets 


ued hae tee for n = 2k 
and 


ack Se Ee, Pee Keeat for t= 2k + ἢ 


11. Taking {t1, my, me,...} = {0, —1, —1, —2, —2,...}, we obtain 
the polynomials 


pals) = fo + [ἡ fa 2 74:6. 


(Nanas (Paras: 
(n + 1 terms). 


They interpolate on the sets {x_,, X_-n41,.--;X,} for m = 2k and 
Bay Hoty as 'vs Nyy Ot ἢ Ξε 2K 2. I. 

The formulas obtained in the examples 10 and 11 areknown, respectively, 
as the Gauss forward and the Gauss backward formula. 


Problems 


6. Forming differences of the values of the function Jp given in problem 15, 
chapter 10, find Jo(5.5) by the four interpolation formulas given above. 


7. Using the fact that 
Pp eee ὙΠ 
(CO) 


224 elements of numerical analysis 


and expressing forward differences by backward differences (see $6.9), 
show that the Newton backward formula can be written in the form 


pa) = fo - [νι + (να -τὐὐ τυ να 


8. Expressing ordinates in terms of differences. Establish the formula 


Je 
f= > (/,) 4% ae ῦ,1,2.... 


m=0 


(a) by induction; (b) by considering it as a special case of the Newton 
formula. 

9, Using the fact that the identity of problem 8 must hold for arbitrary values 
of fo, A,..., show that for arbitrary integers ἡ and Καὶ such thatO Sv =k 


Zire naa gee 


10. Study the convergence of the infinite series 
Arti) 
Zot 


(Newton’s formula extended to infinitely many terms), where f(x) = e*, as 
it depends on x and ἢ. (Use problem 5 and apply the ratio test.) 

11. Let f be » times continuously differentiable on a suitable interval. Show 
that for some & & (Xo, Xn) 


Afy = hf (E). 
[Differentiate the function 
= [δὴ 4, _ xX — Xo 
101-3 (lam 12s 


n times with respect to x and apply Rolle’s theorem. | 
12. As an application of the preceding problem, show that 


lim b=" Δι = f(%0) 
ho 


for any sufficiently differentiable function /- 

13. Assume that the values of /, are known only up to rounding errors é,, 
where |e,| S «. Show that the maximum error in 4*f, can be as large 
as 2*e, 


11.3. Some Special Interpolation Formulas 


In spite of their basic simplicity the interpolation formulas of Newton 
and Gauss given in §11.2 are not frequently used in practice, mainly 
because of their lack of formal symmetry. More frequently used in 


construction of the interpolating polynomial: using differences 225 


practical interpolation are certain formulas named after Stirling, Bessel, 
and Everett. 


The elegant formulation of these formulas requires the introduction of 
two new operators yu and ὃ in addition to the forward difference operator 
A. The operator μ is defined by 


(11-17) ft) = 5 [¢(x + 9) ει - 2} 


and consequently is called the averaging operator. The operator ὃ is 
defined by 


(1-18) sfx) = f(x +5) - F(x - ἢ 


and is called the central difference operator. Note that ὃ can always be 
expressed in terms of 4, and vice versa. For instance, 


APF pm Sf, ke = 0, ἵν pecans 
A further identity to be noted is 


μδίο = HAf_1 + Af) = 44 — f-1). 


Stirling’s formula. For a given even integer ἢ = 2k both the Gauss 
forward and the Gauss backward formula yield the interpolating poly- 
nomial corresponding to the set {v_,, ¥_443,...,X,}. Thus also their 
arithmetic mean must yield the same polynomial. The resulting 
polynomial 


Pasls) = fo + (1) 5 (Af + 46] 


(3) . e ) a δὼ C: ᾿ εἰδῇ. + ΑΒ 


eo [OPES ἡ [2 τ 
can by virtue of the identity 
ΠῚ Ὑ 112. ee 
be written in the form 
(11:19). paxls) = fo + (7) [udfe + 5 8%] τ... 


δ. a 2k-1 S sok 
+ ( % — 1 Jus Jo + 578 4] 


226 elements of numerical analysis 


This formula, called Stirling’s formula, expresses the polynomial inter- 
polating at an odd number of equidistant points in terms of central 
differences at the center point. It is preferably used for interpolation 
near that center point. 

Bessel’s formula. If n is odd, ἡ = 2k + 1, the Gaussian forward 
formula at x) and the Gaussian backward formula at x, interpolate on the 
same set of points Sopa. = ἔχον χοκράρννυ Χμ ἢ. The Gaussian 
backward formula centered at x, is given by 


͵ "ΕἼ Ps 4 ke | 
Ρακει(8) = ἢ + [ἢ Af + [ > ) af, ἘΠΩ͂Ν Ὁ κα iw a ,) aes 


where 


x-x X—-X,—A 
Ξε eee ταν =s— ], 


᾿ h h 


Averaging this expression with the Gaussian forward formula at x9 we 
get, after some simplification, 


(11-20) μας «1(5) = μή + (5 — 4) δῇ 
+ (5) 5% + 10 — Dial 4 


et τ | 2k eT, ex ak+if 1 
(PTS) [ud hie + ye  — DH]. 


This formula is known as Bessel’s formula. It expresses the interpolating 
polynomial for an even number of consecutive points in terms of central 
differences. It is preferably used for interpolation halfway between the 
two center points. 

Everett’s formula. We start from Gauss’ forward formula, where n is 
odd, ἡ = 2k + 1. Eliminating differences of odd order by use of the 
formula 


Bt = Aa -- A°"f — ks 
we obtain 


ρα 106) = fo + (1) i - fo) + (5) 4... 


(Ὁ "μὴ a+ (Ff) oat 


2k 2k + | 


1 (* +k-— ') Af, + ι +k eee ΔΓ ἡ 


a 


construction of the interpolating polynomial: using differences 227 


Collecting equal differences, expressing them in terms of the central 
difference operator and using the identity . 


Cae  ἢ- stk-—-1]1 s+k 

2k τ». 4 = ( 2k )( - 54) 
=(3**) 

~ Bea 


i= 1 —'s, 


where 
we obtain the formula 


(1:21) par sss) = (;) fo + (03°) 8% τ, Ὁ {τ yom 


+ (ps (enn Gel) om 


due to the British astronomer Everett. 

Of the three formulas given above, the highly symmetrical and elegant 
formula due to Everett has found great favor in practice. It has the 
advantage of using only differences of even order, and of furnishing an 
interpolation polynomial whose degree (and consequently, accuracy) is 
higher than the order of the highest difference employed. For instance, 
with a column of second differences alone we can calculate the cubic 
interpolating polynomial, whereas the application of all other formulas 
requires three difference columns. For these reasons many tables of 
higher transcendental functions, if they give any differences at all, give 
second differences only. No special tables of the Everett interpolation 


coefficients 
5. ἘΚ ‘f= 
ἜΒΗ i) k= l, Cie τ 


are required, as these coefficients are identical with the extreme (first 
and last) Lagrangian interpolation coefficients interpolating on the set 
Katey Xe ιν ++ +s Xe41 (See 810,2). 


11.4 Throwbackt 


Throwback of higher differences into lower differences is an extremely 
simple but ingenious device, due to Comrie, which enhances the accuracy 
of interpolation formulas without increasing the required numerical 
work. The idea is quite general; we explain it in the simplest possible 
Situation. 

t This section may be omitted at first reading. 


228 elements of numerical analysis 


-Everett’s formula for the interpolating polynomial of degree 5 may be 


written 
pols) = fo + (Ὁ {δι + ἘΞ ὃς - 9 0) 


plus a similar term with ¢ replaced by s and the subscript 0 replaced by 1. 
In the interval Ὁ Ξ ¢ S 1 the factor 
Gt 2. -Δ2.. f-—4 
4.5 3 
varies between the narrow limits — (4/20) and —(3/20). Thus if we define 
modified second differences 6*/;* by the formula 


Sift = BY. — τς δὲ, 


then Everett’s formula with fourth differences can be approximated by the 
formula 


pos) = fo + ("5 ) δῦ + oh + (° 3 |) os 


where ἐ = 1 — 5. This formula can be used in the same way as the 
formula involving second differences only. The error committed in 
replacing p, by pF 1s equal to 


(3 "[-m- eee + (3 Γς- ἐπ 75. 
=5 {(1 Ξ ἼΝ ὶ ') δὲ + (1 -- ΤΩΝ ; ἢ aif} 


For 0 Ξ s = 1 (and consequently 0 < ¢ Ξ 1) this turns out to be less 
than 
0.00122 max {|8*f,|, [δ᾽ .}}. 


Thus if one unit in the least significant digit carried in the computation is 
denoted by u, we have 


|ps(s) — »56)} 5 5 


already if the fourth differences are less than 400w. 

Many mathematical tables giving second differences print modified 
instead of ordinary second differences. In forming the modified differ- 
ences, the factor —(7/40) is frequently replaced by —0.184, a value 
suggested by Comrie from a consideration of Bessel’s formula. Tables 
with modified second differences make it possible to calculate (with an 


construction of the interpolating polynomial: using differences 229 


error of less than one round-off error) the fifth degree interpolating poly- 
nomial from only one difference column, with an amount of work 
comparable to that for the third degree polynomial. 

EXAMPLE 

12. According to the tables of Jahnke and Emde [1945], x = 11.620 1s a 
zero of the Bessel function J,(x). We check this statement by evaluating 
J.(11.620), using the following values given in the British Association 
tables: 


x Jo(x) 52% 


11.4 0.05118808 
11.5 0.02793593 
11.6 0.00461559 15622 
11.7 —0.01854910 37729 
11.8 —0.04133747 
11.9 —0,06353402 


With x, = 11.6, we have s = 0.2, t = 0.8 


Erg =< 
3 3 


yielding J,(11.620) = 0.00003692. The derivative of J/2(x) satisfies 

JX) = :{υ, ΟἹ — Ja(x)] 
and thus, again from tables, has near the suspected zero the approximate 
value —0.23. From Newton’s formula, we thus expect 


—0,00003692 
—0.23 


) -- δ» ) — —0.048, 


11.62 — = 11.61984 


to be a more accurate value of the desired zero and indeed, Watson [1944] 
lists the desired zero as 11.6198412. 


Problems 


14. For the interpolation problem discussed in example 12, estimate 
(a) the error J2(x) — ps(s) due to pure interpolation; 
(b) the error p#(s) — ps(s) due to throwback. 
(For (a), use the integral representation of problem 9, chapter 9, to 
estimate the high derivative of Jz required.) 

15. Proceeding as in the derivation of Everett’s formula, obtain an inter- 
polation formula that uses only zeroth, third, sixth,... differences. 

16. Devise a method for “throwing back the second differences into the 


230 elements of numerical analysis 


zeroth” in Everett’s formula, and estimate the error involved. Why is 
the method less efficient than the one discussed in the text? When would 
it be useful? 

17. The zeros of the Bessel function J,(x) are known to approach the points 


x = (n — λ)7τ n= 1,2,3,.... 


Verify this statement by evaluating Jo((m — 1)πὴ, n = 1, 2,3,..., using a 
table of Bessel functions. 

18. Check the values of the modified differences given in example 12 by 
forming ordinary second and fourth differences. 


Recommended Reading 


Finite difference techniques have been pushed to an especially high level 
in Great Britain. Books such as Fox [1957] as well as the volumes issued 
by the National Physical Laboratory [1961] and by the Nautical Almanac 
Service [1956] contain much excellent advice on interpolation by differ- 
ences. On the whole, however, the subject of interpolation of tables 15 
somewhat in the eclipse due to the fact that most tables have been 
replaced by prestored programs in digital computers. 


chapter ] τ numerical differentiation 


Above we have used the interpolating polynomial to approximate values 
of a function f at points where f is not known. Another use of the 
interpolating polynomial, of equal or even higher importance in practice, 
is the imitation of the fundamental operations of calculus. In all these 
applications the basic idea is extremely simple: Instead of performing the 
operation on the function ἔς which may be difficult or—in cases where / is 
known at discrete points only—impossible, the operation is performed ona 
suitable interpolating polynomial. In the present chapter this program is 
carried out for the operation of differentiation. 


12.1 The Error of Numerical Differentiation 


Let f be a function defined on an interval / containing the set of points 
S = {Xo, Χιν τ νιν χα (not necessarily equidistant) and let P; be the 
polynomial interpolating f at the points of the set S. We seek to 
approximate 

fix) by Pax), xel, 


and wish to derive a formula for the error that must be expected in this 
approximation. 

It seems natural to obtain an expression for the error f’ — Ps by 
differentiating the error formula (9-5). If f has a continuous derivative 
of order ἢ + 1 in 7 and if x € J, it was shown in §9.2 that 


(12-1) I(x) — ΑἹ = L(x)g(x), 

where 

(12-2) L(x) = (« — Xo)(x — X1)...(X — Xn), 
L at+lye - 

(12-3) g(x) = ἀν Ὁ (ξ.). 


231 


232 elements of numerical analysis 


é,, being some unspecified point in the smallest interval containing x and 
all the points x; i = 0,1,...,”. Corollary 9.2 shows that, although no 
assertion can be made about the continuity or differentiability of ἕξ, as a 
function of x, the function g can be extended to a function that is con- 
tinuous on 1. A similar consideration shows that the extended function 
is even Ἢ times continuously differentiable on J. Hence we obtain by 
differentiating (12-1) 


(12-4) ΤΌ) — Ps) = L'(x)g(x) + LODg’(>). 


If x is arbitrary, this expression is not of much use for the purpose of 
estimating f’ — Ps, since we lack a convenient explicit representation for 
g such as (12-3). However, if x = x; (j = 0,1,...,”) in (12-4), we 
obtain by virtue of L(x,) = 0 


F(X) — Ps(xj) = L'(xs) gC). 
Recalling that 


Il@-x n 


L'(x;) = lim ere lim Π (x -- x) = |] | @; -- x) 


τῷ Ν᾽ πα ἃ ἅτε, ἴς i=0 


ry if 
we can state the above result as follows: 


Theorem 12.1 Let the function f be continuous and ἢ + | times 
continuously differentiable on an interval 1 containing the ἢ + 1 
distinct points Χο, .X4,...,X,, and let Ps denote the polynomial of 
degree = nv interpolating f at these points. Then for each j, j = 0, 
1,..., the interval spanned by the largest and the smallest of the 
points x; contains a point €; such that 


(125). γοὺ — PS) = GSE) TT ὦ -- οὐ. 


0 
1 


Problems 


1. For a sufficiently differentiable function f, f’(0) is approximated by 
differentiating the polynomial P interpolating fat the points 0, A, 2h,..., 
nh, Give a formula for the error f’(0) — P’(0). 

2. Indicate a /ower bound for the error if the derivative f’(O) of f(x) = e* is 
replaced by the derivative of the polynomial interpolating f at the points 
3,1 ὕὌὕὖ 

3. Give a formula for the error of numerically differentiating fat x = 0 by 
differentiating the interpolating polynomial using the points —A, 0, 4H. 
Obtain the same error formula by subtracting the Taylor expansion for 
f(—A) from that for /(h), both terminated with the term involving ἡ“, 


—— 


numerical differentiation 233 
12.2 Numerical Differentiation Formulas for Equidistant Abscissas 


We now shall assume that the points x, are equidistant, x, = x, + kh, 


kA τοῦ, +1, +2,.... For given m and πὶ > m, 
I] (x; — *;) 
τῇ 


is smallest when x, lies halfway between the extreme points x,, and x,. 
This suggests the use of Stirling’s formula (11-19) 


Pasl8) = So + (1) o8fo + 5 8%] + 
+ (OF) [nse + 3 M6 


where s = (x — x,)/A, for numerical differentiation. The derivative with 
respect to x at x, equals h~* times the derivative with respect to sat s = 0. 
The derivatives of the coefficients 


s ist+k—1 m 
x ( ae ) esha: 


are zero, since they contain the factor s*. For the remaining coefficients 
we find 


"“ 
ds\ 2Κ -- Ἰ )_. 
tn tks 
= tim 5 ( 2k --Ἰ ) 
fy ER τ στ κ 2...0F Die D.C kD 
a a-0 (2k — 1)! - 
~1 ((k — It 


sD Enh! 


The derivative of the polynomial P.;(x) = po,((x — Χο) ἢ at the 
central point x9 can thus be written 


(12-6) Pax(xo) = 7 {udfe — EY psy, + 


1 A — V'P ἘΠ ΟΝ 


Or, evaluating the numerical coefficients 


. ] 
Pa(Xo) — h {H9fo i ἐμδῆο + τ ἰμδῦ as ταῦυμὸ "fo + ++i}, 


— 


234 elements of numerical analysis 


Formulas in terms of ordinates can be found by expressing the central 
differences 8?*-+f_,,. and 5?*~1f,,. in the form > οι ἔς and arranging the 
coefficients c, in a table, as follows: 


Table 12.2a 
n —1 0 i 
oe aay 0 
δῇ 1/2 0 πολι | 
pSfo » 0 } 


Table 12.2} 


The average of the coefficients in each column gives us the corresponding 
coefficient for »82*-1/,. For the second and fourth degree polynomials 
we find in this manner 


(12-7) Pile) = 9 i — 7.0), 
(12-8) Pulse) = ¢ Bi — δῦ — thf -- 20 + 2. — Fd} 


I 
= τηί 3 ἜΒΗ — δέον Ἐ7.-} 


An expression for the error f'(xo) — Ρακίχο) is easily determined from 
(12-5). With» = 2k, and the interpolating points arranged equidistantly 
and symmetrically about x9, we find 


ξώτωπὸμ F nares ga τις 
Oe + iho) =(-) aap! 
It thus follows that 


“.Ὁ , p! {ν (k!)? prkeipak+ Dg) 
(12-9) f'(Xo) — Pax(%o) = ieee 2708 (Qk + 1)! ; 
where x_, < & < x, As in the calculation of approximate function 
values by algorithm 11.2, it is thus true that the error committed in 


numerical differentiation 235 


differentiating Stirling’s polynomial in place of the function f equals the 
first omitted term in the difference formula, provided that «8?**1f, is 
replaced by h2**1f2@k+D(é), 


Problems 


4. By using the Newton forward formula, devise a formula for numerical 
differentiation at the beginning of a table and obtain an error formula 
similar to (12-9), 

5. Making use of Bessel’s formula, obtain a formula for numerical differentia- 
tion halfway between two entries in a table, and write the formulas 
resulting from the polynomials of degree one and three in terms of 
ordinates. 

6. By differentiating Stirling’s form of the interpolating polynomial twice, 
show that 


1 
F"(%0) © 75 {5%fo — ὑχδύ + °°). 


Obtain a general formula for the coefficients on the right. 

7. Suppose that, due to rounding, the values f, are known only up to errors 
€,, Where |e,| Ξ e. What is the maximum error resulting therefrom in 
the formulas (12-7) and (12-8), if all arithmetic operations are performed 
without rounding error? 

8. Suppose we calculate Jj(x) by means of (12-7) from a table of J,(x) giving 
six decimals. What is the smallest value of ἢ for which we can guarantee 
that the maximum possible discretization error (due to replacing Jp by P) 
is not exceeded by the maximum possible error due to rounding? 


12.3. Extrapolation to the Limit 


The formulas derived in §12.2 can also be obtained by an entirely 
different method which recalls the fundamental principle of numerical 
analysis applied already in the derivation of Aitken’s 4?-method: Improve 
the accuracy of an approximation using any (possibly incomplete) 
knowledge of the asymptotic behavior of the error. 

The simplest numerical differentiation procedure consists in 


5 : Ϊ 
replacing f"(x0) by (fi —S-1) 
(see formula (12-7) above). We introduce an abbreviation for the 


expression on the right that emphasizes its dependence on the step h by 
defining the basic differentiation operator D,, as follows: 


(12-10) Dif io) = χε Ur + ἢ — fx — ἢ] 


236 elements of numerical analysis 


The error formula (12-9) then states in the special case k = 1 that 
(12-11) Di f(Xo) — f' Oo) = ἐμ ἴθ, 


where X» ~h < E< Xp +A. 

In order to obtain a more accurate statement about the error of D,,/f, 
we use Taylor’s expansion with remainder term. If the function f is 
sufficiently differentiable, we have for k = 0, l,... 


: h* Ree K+) 
fle τὴ =f) + 7Ὸ} το + GIO) + GH rO 


where ἔ is a point between x and x + A. Writing down this expression 
for k = 2m and x = x, and then subtracting from it the same expression 
with ἢ replaced by —A, we obtain after dividing by 2h 


(12-12) Dif (Xo) = f'(Xo) Ἔ a,h* a ἐ 1" Ἢ ΑΚ ΠΗ iu + R,,(h), 


where 
] 


ὥς = ἀκτῦ os key diat atte 1, 


Ae” parses) + fmt ME) 
Bal) = Dee ayle Ty let στῇ" 


Here ἔ; is a point between x and x + h and &, is between x and x — ἢ. 
If f?"™*» is continuous, it assumes every value between 


ΕΣ 
Thus in particular for some ξ between €, and ξ5 (and thus between x — ἢ 
and x + h) 

pees She) _ {{ΞἸἘ ΕἸΧΣῚ + fem*be,)). 


It follows that the remainder R,, in (12-12) can be expressed in the simpler 
form 


(12-13) ΒΘ = oo len MO. 


With this form of the remainder, (12-12) for m = 1 reduces to (12-11). 
In numerical applications the values of f’(xo) and of the constants 
,, 4,... are of course unknown. But the mere fact that a formula of 
type (12-12) holds can be used to improve the accuracy of a numerical 
differentiation. Suppose the operator D,, is applied with two different 
values of ἢ, say A and gh, where ῳ # 0,1. (If fis tabulated at equally 
spaced intervals, g = 4 is a natural choice.) We then have from (12-12) 


Dif (Xo) = ὦ + ah? + OCA"), 
Dan (Xo) = f'(Xo) + a,(gh)? + O(A*), 


numerical differentiation 237 


since O(gh) = O(A). Eliminating from this pair of equations the constant 
a, and solving for f’(x9), we find 


(12-14) Ῥω) = Pi Ee + 0(h%). 

Using two values of the basic differentiation operator D, (which ordinarily 
has an error 0(/?)) we thus have succeeded in deriving a differentiation 
formula with an error of only O(A*). 

Without a more careful study of the error term we cannot assert that 
(12-14) is more accurate than (12-10) for any given A. However, if f® is 
continuous, then (12-14) is always more accurate for sufficiently small 
values of ἢ. 

In the special case g = 2 (12-14) takes the form 


ἀν ΠΣ Ι ᾿ 
I's) = τίσ - τὸ - σα τ 9} + 004%. 
We thus obtain the approximate differentiation formula 


f(g) = at R= Hs thn 


4 


which turns out to be identical with (12-8). 

Another way to look at formula (12-14), more in the spirit of Aitken’s 
A?-process, is to consider it as a device to speed up the convergence of the 
basic differentiation operator D, ash->0. The speeded up operator may 
be written in the form 


Dan pe 4 Ρι. 


(12-15) Di, = 4 


EXAMPLE 
1. To find the first derivative of J,(x) at x = 2. 


Table 12.3 

h Bk D#Jo(2) D**J,(2) 
0.40 —0.56611 8105 
0.20 —0.57406 0360 —0.57670 7779 
0.10 — 0.57605 7896 —0.57672 3741 —().57672 4805 
0.05 0.57655 8030 —0.57672 4742 —0.57672 4808 


The exact value, to nine places, is (2) = —0.57672 4808. 


238 elements of numerical analysis 


Nothing prevents us from trying to further speed up the sequence of the 
values D¥f(xp). The appropriate formula can be obtained by considering 
the structure of the error of δὲ f(x). From (12-12) we find 


Danf (Xo) = f'(%0) + aiq?h® + daq*h* ++ +++ dm-1g?™-7h?™-? + Κ᾿, (αν. 
It follows that 
D*, (xo) = f'(X0) + ἀξίᾳ)" ++++ + an-i@h)?™"* + Ra(@h), 


where 


| — g272k 
aie ἘΞ ] =e Ais 
; R,,(gh) — ΟΝ, (ἢ 
Raq) = = LAO 


Thus, in particular, 
Diff (x0) = f'(%o) + agh* + OF"), 
δὴ f (Xo) = (Χο) + a3q*h* + O(h*). 


Eliminating af yields 


(12-16) αὼ = D*, f(Xo) + O(h®), 
where 

au D* — qg*D* 
(12-17) ΔῈ = a = Ἣ 


Still using the same basic differentiation operator D,, we thus have 
obtained a differentiation formula with an error O(A°). Obviously the 
procedure can be continued, the only limits being those set by the accuracy 
of the values f(x) + A) themselves. The effectiveness of the procedure is 
illustrated by the last column of table 12.3, where the derivative has been 
obtained to nine significant digits from only six values of the function. 


Problems 
9. Using Taylor’s expansion, show that 
1 8% = ὦ + SLO 
ee τ ᾿ 
wherex -- ἢ - €<x+h. 
10. By eliminating a, and a2 in the formulas 
Dinf(Xo) = [(χο) + a(kh)® + ao(kh)* + OCH), ΚΞ 1, 2, 3, 


obtain a differentiation formula that uses the abscissas f—s, f-2,- . vad 
and has an error 0(f°). Compare the result with the formula (12-6) for 


k = 3. 


se 


numerical differentiation 239 


11, Suppose that, due to rounding, the values D,,f(xo) are known only up to 
rounding errors not exceeding e/h. If the values D*f(xo) and D#*f(xo) 
are formed with g = 4, what are their errors due to the inaccuracies in 
Di, f(%o)? 


12.4 Extrapolation to the Limit: The General Case 


Extrapolation to the limit as applied to numerical differentiation in 
§12.3 is a mere special case of the following general situation: An unknown 
q'antity a) is approximated by a calculable quantity A(y) (y > 0) such 
that 


(12-18) lim A(y) = do, 
yo 


and it is known that there exist constants a,, d2,... and C,, C2,... sucht 
1πᾶ| tor kK = 1, 2,3,... 
(12-19) A(y) = ἀρ + Gy + Gay? ἘΠ ::Ὁ ayy" 1 + Ry), 
where 

Ὁ} τον y > 0. 
A triangular array of numbers 4,,, (m = 0,1,2,...; ἢ Ξ m) is now 
formed in the following manner. 


Algorithm 12.4 For two fixed constants r and yg (0 < r < 1,9 > 0), 
ist [ῸΓ μὲ = -0,15.2,..4 


An,o ἘΞ A(r™yo), 


mei 
Agni — 7 Pee ee 


t= prt 3 Ἡ Ξε, ],...,. 1 -- Ἰ. 


An wee = 


The manner in which the numbers A,,,,, depend on each other is indicated 
in scheme 12.4. 


Ag. 
Aj, Ay 
ΓΕ Aoi Ag 2 


Az, A331 Ἂς. A3,3 


Scheme 12.4 


This scheme has the following interesting property. 


T In technical language the hypothesis (12-19) means that A(y) admits an asymptotic 
expansion for »—-0+. We do not assume that the infinite series ap + ayy + 
a,y*+--+ converges. 


240 elements of numerical analysis 


Theorem 12.4 Let A = A(y) satisfy the relations (12-19). Then for 
each n such that a,,, # 0, the (ἢ + 1)st column in the scheme 12.4 
converges faster to a) than the nth column, in the sense that 


(12-20) lim a δ aid), 

More generally, for each fixed value of n 2 0, 

(12-21) Apa = do + (= Lay ar Pt PMV) + θ(πγο)" "5 

as m—> ©. 

Here 0(z) denotes a quantity that remains bounded for z—>0 when 
divided by z. 
Proof. We shall establish the following proposition, which is stronger 
than (12-21): For each fixed n, and for each p > ἢ, 
(12-22) An,n = do + Gag Po An +2, n(™Vo)"** 
tees Ap alr™Vo)? + O(r"™¥o)?**) 


as m—> oo, where forg=n+1,n +4 2,... 


{1 τ ot = ἘΠ £ 
Ee HE ΠΝ τὸν 
(Empty products are to be interpreted as 1.) | 

The proof is by induction with respect ton. Forn = 0. the formulas 
(12-22) and (12-23) are a mere restatement of our hypothesis (12-19). 
Assuming that they are valid for some n 2 0, it follows from algorithm 
12.4 that An n+1 has a representation of the form (12-22) where the 
coefficient of (ro)? 15 


(12-23) es 


] δὰ putrid 
Ta τὸ Gan 
This is zero for g =n +1 and equals a,,,; forg>n-+ 1, verifying 
(12-22) with n increased by one. 

Setting g = n + 1 in (12-23), we obtain 


bar Π = preety, {1 τὶ ἢ 


τοῖν πο ΤῈ ρα." a oe 
Πα π = (1 oe r)(1 = r=), { ies Ὁ) +1 
τ pearance CM =D. — 1) 


hod Pl Py 


(-- ever nin + ViZq Ae 


in view of the well-known formula 
n(n + 1) 
ΓΕ oa 


a eee 


numerical differentiation 241 


Letting p = n + 1 in (12-22) and using the above value of n+1.n, We thus 
obtain (12-21). 


If a,.1 # 0, relation (12-21) may be written 


Ben — ty = (— ine ed + O(r™yo)]. 
Thus, 


Ryents fo. _Sn2t ταταξ αν γΠ + Οὐ ἤν), 
which implies (12-20). 

The technique described in 812,3, and in particular in example 1, 
corresponds to the special case A(h?) = D,, r = q? of algorithm 12.4. 


Problems 


12. A numerical computation furnished the following approximations 
A(2~") to,a quantity ao: 


A(2-") 
0.000000 
0.250000 
0.316406 
0.343609 
0.356074 
0.362055 
0.364987 
0.366438 


= 


Determine @ as accurately as you can. (Exact value: 
ay = et = 0.367879.) 
13. Show that the hypotheses of theorem 12.4 are satisfied for the function 
Atyy= (1+ yh" (y > 0). 

14. Show that the scheme generated by algorithm 12.4 is identical with the 
scheme that would be obtained by calculating A(0) by Neville interpolation 
(algorithm 10.5) using the interpolating points yo, ryo, r2Vo, .... 

15, Suppose the values of the function A(y) used in algorithm 12.4 are known 
only up to errors Se. How large is the resulting uncertainty in the 
columns Am,» ? Give numerical values for the ‘‘noise amplification 
factor” form = 1,2,3 and r = 0.5, r = 0.25. 

16. The following relations hold for the symbol 0(z) introduced after theorem 
i2.4: 

O(z) + O(z) = O(z); 
O(cz) = O(z) for any constant c. 


From what theorems about limits do they follow? 


242 elements of numerical analysis 
12.5 Calculating Logarithms by Differentiationt 


In this section we shall discuss a further application of algorithm 12.4 
to a problem of numerical differentiation. Let a > 1 be a given number. 
We consider the function 


70) = a? 
and wish to evaluate f’(0). By the rules of calculus, this of course equals 
f'(0) = log a. 


By carrying out the differentiation numerically, we may thus hope to 
obtain a method for calculating the natural logarithm of a given number. 

We use the following one-sided approximation to the first derivative, 
viz. : 


s¢iy = £0 =SO 


By the definition of the derivative, 
log a = f'(0) = lim S(h). 
ho 


This limit relation also holds if we restrict ἢ to the discrete set of values 
h=2 ws Oh Zs. Sel 


(12-24) 5, = S(2-*) = 2"(a2-" — 1) 


we thus have 

log a = lim s,. 
The values a?" required to form the numbers s, can easily be generated 
recursively by successively evaluating square roots, as follows: 


e =a at = Va. 


We thus have, in principle, solved our problem. 

Readers who have studied $8.4 will immediately observe, however, that 
the algorithm thus defined is numerically unstable. The quantity a?" — | 
approaches zero as n—> οὐ. If the numbers a?" are computed with a 
fixed number of decimals, the re/ative error of αὐ" — 1 will ultimately 


+ This section may be omitted at first reading. 


dion 


numerical differentiation 243 
become quite large. By rule (ii) of §8.4, the relative error of s, will like- 
wise be large, and, since the s, approach a nonzero limit, the absolute 
error, too, will grow rapidly. 

A procedure that is more stable numerically can be defined if we 
generate the s, in a different way. Solving (12-24) for a?” " we get 
(12-25) gor” meh τος 
On the other hand, 

Soy See ὦ ηἢ 


= 2"-3(g2"" + 1)(@?"* — 1). 


Substituting for a? “ the value (12-25), we have 


Ι! 


ee 2 πὴ τς, 


= 5, sp oe, 


Sn-1 


Solving the quadratic for s, and observing that s, > 0, we find 
S, = 2(V1 + 2-**15,_, — 1). 
To avoid loss of accuracy, we write this in the form 


ERs 

1). ee n- 1 te 
pee) τ ,. 2g 

Together with the initial condition sy» = a — 1 this relation may be used 
to generate the sequence {s,,} in a stable manner. 

For numerical purposes the convergence of the algorithm thus obtained, 
even though stable, is intolerably slow (see problem 16), If it is to be put 
to any practical use, we must be able to speed itup. Let us check whether 
the function S(/) satisfies the hypotheses of theorem 12.4. Since a” = 
e” lea we have, using the exponential series, 


, l 1 
S(A) = loga + 51 (log a)?h + 31 (log a)Ph? +---. 
Since this series converges for all values of ἢ, condition (12-19) certainly 


holds for A(h) = S(h), and theorem 12.4 is applicable. We thus obtain 


Τ Relation (12-26) also provides us with an example of those relatively rare non-linear 
difference equations that have an explicit solution. 


244 elements of numerical analysis 


Algorithm 12.5 For a> 1, let Ago =a-—1, and calculate for 
se CEE 


2.4, a 0 
An,o = —————————————————  Σς ΞΞ σε 
Ι ΕΓ ΡΟΣ γιὸ 
pgs as Am-1 π 
Annet eis tO 
where 
(12-27) eet 2. me — 1. 


Theorem 12.4 yields in the present special case 


Theorem 12.5 All columns of the scheme generated in algorithm 12.5 
converge to log a, and each column converges faster than the preceding 
one. 


One can also show that the sequence {A, ,} of diagonal elements 
converges to log a, and converges faster than any column. 


EXAMPLE 
2. For a = 6 algorithm 12.5 yields the following triangular array: 


Table 12,5 


5.000000 

2.898980 0.797959 

2.260338 1.621697 1.896277 

2.008267 1.756196 1.801029 1.787422 

1.895937 1.783606 1.792743 1.791559 1.791835 

1.842872 1.789806 1.791873 1.791748 1.791761 1.791759 
1.817076 1.791281 1.791773 1.791759 1.791759 1.791759 1.791759 


The table shows that log 6 = 1.791759 can be obtained to seven significant 
digits by the evaluation of six square roots. 

If higher accuracy is desired, it may be necessary to limit the number of 
columns of the scheme in order to avoid build-up of rounding errors. 
Condition (12-27) should then be replaced by 

n=0Q0,1,...,min(m, N) — 1 


where N is the number of columns desired. 


Problems 


17. Using Taylor’s formula, find a /ower bound for the error |S(/) — log αἱ. 
Thus find a lower bound for the number ἢ necessary that 


ls, — loga| < 0.5 x 1078, 


ifa = 6. 


numerical differentiation 245 


18. Use algorithm 12.5 to calculate log e to six decimals. 

19. Let 1 S ae. By obtaining explicit values for the constants C, in 
(12-19) from Taylor’s formula, find bounds for the errors of the quantities 
Am, and A,,2, considered as approximations to log a. 


Recommended Reading 


The principle of extrapolation to the limit was first clearly stated 
by Richardson [1927]. Repeated extrapolation to the limit is discussed 
by Bauer ef al. [1963]. Concerning its application to numerical 
differentiation, see also Rutishauser [1963]. 


Research Problem 


Since every positive number can be represented in the form e"z, where 
el? Ξ 2 < e'? = 1.648..., it is only necessary to know log x in the 
interval (1, e''*), Make a comparative study of the computation of log x 
in that interval (a) by algorithm 12.5, (b) by Taylor’s series. 


chapter | 3 numerical integration 


We now turn to the problem of numerical evaluation of definite integrals. 
The method is the same as in chapter 12. Instead of performing the 
integration on the function f, which may be difficult, we perform the 
integration on a polynomial interpolating fat suitable points. We begin 
by giving a theoretical appraisal of the error committed in this 
approximation. 


13.1 The Error in Numerical Integration 


Our starting point is once again the general error formula proved in 
89,2, There it was shown that if P, interpolates f on the set of points 
S = {Nos Xts>s-9 Ants then 


(13-1) I(x) — Ps(x) = L@)g@), 
where 
L(x) = (% — Xo)(% — χῇ.. «(ἃ — Xp), 
and where g is a continuous function that can be expressed in the form 


(13-2) 40) = af Ee) 


é, being a suitable number contained between the largest and the smallest 
of the numbers xo,...,x, and x. If we integrate (13-2) between two 
arbitrary limits a and ὃ, we evidently have 


b b 
| fivde « | P(x) dx + κῶν, 


where the integration error R&” can be expressed in the form 


υ 
(13-3) Re? = | L(x)g(x) dx. 


numerical integration 247 


Let us now assume that the polynomial L does not change its sign in the 
interval (a, δ). This will evidently be the case if and only if no inter- 
polating point x, is contained in the interior of (a, δ). We then can apply 
to the integral representation (13-3) the second mean value theorem of the 
integral calculus (see Buck [1956], p. 58), with the result that 


ὃ 
RE = (0) | L(x) dx, 


where ¢ is a suitable point in (a, δ). In view of the definition of g, we thus 
have the following result: 


Theorem 13.1 Let the function f and the polynomial P satisfy the 
conditions of theorem 9.2. If the interval (a, b) contains no inter- 
polating point in its interior, then 


ae ce fmrpey [Ὁ 

(13-4) | fede — | P(x) dx =F) | LG) de, 

where € is a number contained between the largest and the smallest 
of the numbers Xp, ..., X,,, @, δ. 


It is clear the theorem 13.! cannot be true if (a, b) contains an inter- 
polating point, since it then may happen that the integral on the right of 
(13-4) vanishes. 


EXAMPLE 
1, Χο ΞΞ a Ay = 0, Ag = l, L(x) ἘΞ x(x? rite 1), a= =e, 


[ L(x) dx = 0. 


Problems 


1. The integral Ε sin x dx is evaluated by interpolating the function f(x) = 
sin x at the points x = Ὁ and x = 7. Calculate the bound for the error 
given by theorem 13.1 (using the fact that all derivatives are bounded by 1), 
and compare it with the actual error. 

2. Dropping the assumption that the interval (a, 6) contains no interpolating 
point, show that 


My + ὅτε 
eo ἐπ]. [1Χ}} dx 


= 


by ih 
| fo) dx — i P(x) dx 


where M41 = Maxze, |f'"* (x). 


248 elements of numerical analysis 


13.2 Numerical Integration using Backward Differences 


Once again we specialize to the situation where the interpolating points 
xX, are equidistant, x, = Χο + kh, k = +1, +2,. We begin by de- 


riving an integration formula which appears to be lacking in symmetry, | 


but which for this very reason will be of use in the numerical integration of 
differential equations. 

Let us assume that the function / is interpolated at the points xp, 
X_4,..-,X_,, and that the integral of fis desired between the limits x, and 
x,. Setting as before 
x = Xo. 


i= 
i 


we use the Newton backward polynomial in the form given in problem 7, 
chapter 11, 


ΡιΟὺ = Σου Vo 


Introducing s as variable of integration and observing that the differences 
are independent of s, we find 


[. P(x) dx =h [ {> (— 19" *\vnfo} ds 


ke 
= h > a δ. } 
m= ὃ 
where 
ioe 
(13-5) cm = (= 1)" | ( ᾧ Ἂς πὰ τ... 
o ἃ ΜΙ 


We also have, in the present case 


| _@= ae +x ὃ. τίν τος ἢ 
+ πα (1 +1! 
τ: Sein Oa | 


since 1, does not vanish in the interval (x), x,), and since 


] 
(k + 1)! 


theorem 13.1 thus yields the formula 


PAS GE Se Oss 


(13-6) | * #(x) de = WMegfy + C1Vfo + caV%o + 
. + GV ¥fy + Cea rh® fF (O}, 


numerical integration 249 


where Χο < € < x,, and where the constants c,, are given by (13-5). 
As on several previous occasions it is seen that the error due to inter- 
polation equals the first omitted term in the difference formula, provided 
that the difference is replaced by the corresponding derivative, suitably 
normalized (compare similar statements in §11.2 and §12.2). 

From the definition (13-5) we find 


ἀπο (γώ τὶ 


For larger values of m, however, the c,, are more easily calculated by a 
method to be explained in §13.4. 
In an entirely similar manner we can derive the formula 


“1 
a = | bios £. 
0 


(13-7) tne fo) dx = Mek fy + ck&Vfy + BVH + 


+ CEN Jo ἘΠ τά χε; 
where x_, < & < Xp, and 


0 ΕΣ 
(13-8) ce τ- (-ἢ5 | ( Ἷ ΓΤ ΤΠ 
ἘΞ 


iit 


In particular we find 
a = |, ΟΥ = —4. 


Again a similar statement about the error applies. 


Problem 


3. Find a formula that integrates between x_1;. and x,;2, using backward 
differences at x». What can you say about the error? 


13.3. Numerical Integration using Central Differences 


If values of fare available on both sides of the interval of integration, it 
seems preferable to perform the integration on a polynomial that takes into 
account all these values. Bessel’s formula (11-20) recommends itself for 
the purpose of integrating between the limits xy and x,. We again 
introduce s as a variable of integration. If s is replaced by 1 — 5, the 
binomial coefficients 


(° + m— ᾿ 

2m 
remain unchanged, but the factors s — 4 change sign. It follows that all 
integrals of coefficients of the form 


oo τ +m— Ἵ 


250 elements of numerical analysis 


vanish. If Po,,, denotes the polynomial interpolating f on the set 
Ans Ao dees oa the ds WE thus find 


[* Paess@9 dx = ΕἿΣ ΓΝ Ἵμ Bhindi 
ro Τρ m=O 


κ 
ἘΞ h > By p0? πῇ, 
m=0 
where 
1 (5 Ἐπ πλπ-- a 
(13-9) bn = [ ( ἊΣ ) ds, m= 0/152; 255 
Since, in the present case, 
] aN A Ge ee) εὐ πο λές ρα) 
Gk + pil = Qk + 2)! 
= ao s+k 
\2k + 2 


and consequently 


δι: Ξ i. L(x) dx = a dla! 1 


theorem 13.1 yields the formula 


(13-10) | *f) dx = h{boufijo + bypd*fijo +--- 
+ by 8?* fo + by. h2®* ΓΚ +22), 


ὙΠΟ Xo Φ ἔ τὶ Kp as 
We easily find ῥὺ = 1, δὶ) = --ἰς. Thus the case αὶ = 0 of (13-10) 
yields 


[1 =f τ 1] - [5 710, 


where x, < € < x,. This is the familiar trapezoidal rule of numerical 
integration. More about it in §13.6! 


Problems 


4. By integrating Stirling’s fermula (11-19) between the limits x_, and x, 
obtain an approximate integration formula of the form 


sb 5% fo + Ἀν ει)» 


Ϊ = I(x) dx = h{sofo + 518°fo + 
=—2 


numerical integration 251 


and show that the remainder R;,,., is of the order of f2**2, (Theorem 13.1 
is not applicable in this case, why?) In particular, show that sp = 2, 
= ὁ and hence obtain the approximate integration formulas 


1 
a f(x) dx = 2hfo + O(h?) — (Midpoint rule); 
=—1 


Tj hh 
i} Καὶ dx = 3 (f-i1 + 4fo + fi) + O(A5) (Simpson’s rule), 
τ 1 
5. Using Taylor’s expansion, show that 


᾿ f(x) dx = 2} + τ 3 Γ"(ὦ), ἀῶ oe ey 


13.4 Generating Functions for Integration Coefficientst 


We left unresolved the problem of finding numerical values of the 
coefficients c,,, c*, and 6, introduced in the two preceding sections. 
Such values are conveniently found by the method of generating functions. 
The generating function of a sequence of coefficients Cm; for instance, is 
the function C defined by the power series 


C(t) = > racer 


ΠῚ πα ἢ 


If we succeed in determining a closed formula for C, we may hope to find 
numerical values of the c,, in a very simple manner. 

We exemplify the method with the coefficients c* defined by (13-8). 
Their generating function is 


C*(t) = 2 ch” = Σ (τὴν Ἂ oo» ἥν. 


It is easily seen that [δ] ΞΞ 1, m=0,1,2,.... Hence the power series 
converges uniformly for [1 Ξ 4, say. Interchanging summation and 
integration we find 


οὐ -[ ΣΟ υ era 


1 m=0 


By the general form of the binomial theorem (see Taylor [1959], p. 479), 
ΣΕ m= - 97 


} This section may be omitted at first reading. 


numerical integration 253 ral 


252 elements of numerical analysis | 


We thus find 
0 
cx = | (1 — 1)-* ds. 
ag 


The integration can be carried out by observing that 


(1 — 76 = ο΄ slog (1 πὸ 


We thus find 
Ϊ 
{7} 3}. af ΞΕΞῚ —s log (1-£)70 
--- f τ 
~ —log (1 — ἡ 
Observing 


A ae c- oq gedy yee. 


we thus have the identity 


1 = cx == 


= (οὗ + cf¥t + Οὗ +--+.) + 424 113 +---). 
Comparing coefficients of like powers of f on both sides of this identity, we 
find 
δ ἘΞῚ 


I 


mai 0, 4 a 


εἰ Ἐπ ee 


These relations can readily be used to calculate the values of the coefficients 
ΟἿ recursively. 


Table 13.4a 


0 1 2 3 4 5 
peat jis 4 Qe 3 
2 12 24 720 160 


Wil In a like manner, we obtain for the generating function 


C(t) = > Gar 


m=O 


(13-12) 


of the coefficients c,, defined by (13-5) the closed expression 


t 
@d=-fjbse -- ἢ if 


the recurrence relation cy = 1, 


(13-13) C(t) = 


| ] : 
Cn + 2Cm—-1 + 3€m+a Ἔτη. Ὁ Ἢ Oo = I, m= 1,2,... 
and from it the numerical values 
Table 13.4b 
0 | 2 3 4 5 
pred: idee oe 
2 12 8 720 288 


In order to obtain a recurrence relation for the coefficients b,, defined 
by (13-9), we write their generating function in the form 


B(t) = Ps b,(2t)2™ = [ 2 (’ ea ‘ear Ἢ 


To evaluate it, we require the formula 


Sik M1 ee εἰ τ Ὁ 
(13-14) >, ( Ὡ Jen - - ) 


where 
T=V14+ 72. 


This is a result from the theory of Legendre functions (see Erdelyi [1953], 
equation 3.2 (14) in connection with 3.5 (12)) whose derivation cannot be 
given here. Some manipulation yields 


oo | t 
ΞΞΞ | ἀπ Ὅς ς; 
y= 2 = ETH 
Since 
4 og (T+ ἢ) = a 1+ er + fe a -... 
dt Τ l 2 
we easily find 


=} tee (ἢ :[.}6 +e 
ttlgT+)=1+ 3 Piaf es. 


254 elements of numerical analysis 


Hence we have 
| 


log cs + #) B() = 


oa 


! Ὃς Pus ἘΣ (ze +++] [bo + 4.135 + 16,1) +--+] 


2 
=I1+ a + Faas nes 


Comparing coefficients, we get 


(13-15) 4"), + —— (57) Ἢ a Se me (in) δ Cn) 


3 1 Lm 
w=, toe. 


The following values are obtained: 


Table 13.4c 


ΠῚ 0 1 2 3 
᾿ ᾿ 1 11 191 
a a ees es 60480 


Problems 


6. Find the generating function for the coefficients in the formula 
| 1 
f'(Xo) = h [αι Δ + ag A*fy +--+] 


obtained by differentiating the Newton forward interpolating polynomial 
at X = Xo. ) 
7*, Obtain the generating function for the coefficients introduced in problem 4. 
8. Using generating functions, show that 


Co ith Cnc pp feces 


13.5 Numerical Integration over Extended Intervals 


In §13.2 and §13.3 we have considered the problem of evaluating the 
integral of a function Κ᾽ over an interval of length / in terms of differences 
calculated with the step 4. Here we shall study the equally important 
problem of evaluating 


i= i TO) dx, 


numerical integration 255 


where [a,b] is a fixed, not necessarily short, interval as accurately as 
possible. It is assumed that fis continuous on [a, Ὁ] and can be evaluated 
at arbitrary points of that interval. Several procedures offer themselves: 
we shall be able to dismiss two of them very briefly. 

(i) Newton-Cotes formulas. The most natural idea that offers itself 
seems to select a certain number of interpolating points within [a, 5], to 
interpolate f at these points, and to approximate the integral of f by the 
integral of the interpolating polynomial. If the interpolating points 
divide [a, δ] into N equal parts, we arrive in this.manner at certain integra- 
tion formulas which are called the Newton-Cotes formulas. Unfortunately 
these formulas have, for large values of N, some very undesirable prop- 
erties, In particular, it turns out that there exist functions, even analytic 
ones, for which the sequence of the integrals of the interpolating poly- 
nomials does not converge towards the integral of the function Κα Also, 
the coefficients in these formulas are large and alternate in sign, which is 
undesirable for the propagation of rounding error. For these reasons, 
the Newton-Cotes formulas are rarely used for high values of N. For 
N = 2, 3, 4 the formulas are identical with certain well-known integration 
formulas which we shall discuss below from a different point of view. 

(ii) Gaussian quadrature. One may try to avoid some of the short- 
comings of the Newton-Cotes formulas by relinquishing the equal spacing 
of the interpolating points. Gauss discovered that by a proper choice of 
the interpolating points one can construct integration formulas which, 
using N + | interpolating points, give the accurate value of the integral if 
fis a polynomial of degree 2N + 1 or less. These formulas turn out to be 
numerically stable, and they are in successful use at a number of computa- 
tion laboratories. The formulas suffer from the disadvantage, however, 
that the interpolating points as well as the corresponding weights are 
irregular numbers that have to be stored. This practically (although not 
theoretically) limits the applicability of these highly interesting formulas. 

In the following two sections, we shall discuss two integration schemes 
that are easy to use in practice. They are (iii) the Trapezoidal rule with 
end correction, and (iv) Romberg integration. 

Both can be regarded as more sophisticated forms of the trapezoidal 
rule discussed in §13.3. 


13.6 Trapezoidal Rule with End Correction 


As in the discussion of the Newton-Cotes formula, let us divide the 
interval [a, b] into NV equal parts of length 


.- πὲ 


hh Ν 


256 elements of numerical analysis 


We write 
XxX, = a+nh, PSA OEE Ἐ δὲ 


and evaluate the integral of f over each of the subintervals [x,-1, Xn] 
separately, using a different interpolating polynomial each time. Using 
the symmetrical formula (13-10), we find for the integral over the nth 
subinterval, if fhas a continuous derivative of order 2k + 2, 


(13-16) : " fx) dx = hfboufy-ro + bw8%fy-re Koo 
+ 5, p8°*f, ~1)2 J. By, Bek FBP tae yy 


where χη. γον < ἔξ, < Xnax Adding up the integrals (13-16) for ἢ = 
1, 2,..., Ν, we clearly obtain, in view of x5 = a, Xy = b 


(13-17) I= Ϊ ΥΩ 


Ν Ν 
= tbo Σ μι τ + δι > δῆ, μα + 
. 1 n=] 


ti = 


N Ν 
+ δὰ > por*fisjo + δε ΡῈ > μων 
n=1 Fo 


where b, = 1, δ. = —74,..., are the constants defined by (13-9) and 
tabulated in §13.4. Let us consider the sums on the right separately. In 
view of 

fn 1/2 = ἌΓ... τ} 


the first term 


h Σ Hfn—ry2 = ALRCfo + A) τ τι τ fe) ἘΠ::- 4-1 + Syd] 
clearly reduces to 


(13-18) Ty =Abfot Ah τὰ τ’ Ἐν- + fv-1 + ΣΝ]. 


We shall call this term the N-point trapezoidal value of the desired integral. 
It is the result of evaluating the integral by the trapezoidal rule familiar 
from elementary calculus (see Taylor [1959], p. 515). 

Even more drastic simplifications occur in the other sums on the right 
of (13-17). Recalling that 


δ a _ kali: 2 ae uci! Εἰ πΕ 
and hence 


poe fn-13 = pare tf, — por” Fao) 


numerical integration 257 


we find that the remaining sums in (13-17) “telescope” as follows: 


N N 
> pont 12 = - (nde τ, — por τ...) 
n=l n=0 


= μδέπ fy, — μόξπ τ τῷ 


(m = 1,2,...,k). Thus, each of the sums multiplying 6, bo,..., ὃκ 
reduces to a difference of two central differences at the endpoints of the 
interval of integration. 

In order to simplify the last term, let us denote by M the maximum and 
by m the minimum of the function f@**(x) for x_,-1 Ξ ¥ S Xwen+e- 
The sum 

N 


OO tiga 


n=1 
having N = (b — a)/h terms, is then contained between the limits 


b-—a b—a 
i m and i 


M. 


Since the continuous function f"*t® assumes all values between its 
extreme values, there must be a & in the above-mentioned interval such 
that 


b τ Of 2k* 28) es es fakrae ), 


π᾿ 


For that value of ξ, we thus have 


N 
f2k+2h ,. > fi2k+2V€ ) = (ὁ ayh2** 1p, ,, fek* 2), 
n=1 


Gathering together the above results, we have 


Theorem 13.6 Let N and k be any two positive integers, and let f 
have a continuous derivative of order 2k + 2 on the interval J = 
fa—kh,b+ kh]. Then 


b 
aa [fea dx = Te + CP + RE. 
a 


where 7, denotes the trapezoidal value of the integral defined by 
(13-18), CY denotes the “end correction” 


(13-20) ΟΡ = ἰδι(μδήν — “8fo) + baud fy — μδ'᾽ 10) + :: 
+ bud fy — wS?* fe) 
and where, for some suitable é € J, 


(13-21) RSP = WPE*?D, (b — ay ferry, 


258 elements of numerical analysis 


In order to formulate the integration procedure implied in theorem 13.6 
in algorithmic terms, one would have to devise a systematic method for 
forming the sequence of end corrections (Ὁ) (k = 1,2... .). This 
would best be done by first forming, in the neighborhood of the points 
Χο and xy, the two-step differences 


Lop, β΄" τ{}...1 = Fu) 
and then using the identities 
μδόπι τα, = 8"-2(uBf,) 
ἘΞ §2am-2 Jn+i1 ae et Bil : 
(Aveo ) (n = 0, Ν). 


The coefficients b,, can be generated recursively from (13-15). 
EXAMPLE 
2. To evaluate Ε © Io(x) dx. We choose the step A = 0.1, leading to 
N= 16. From a table of Bessel functions we find 
716 = 1.28934 6003. 


Since J)(—x) = J,(x), the contributions to the end correction at xX = 0 
is zero. At χῃς = 1.6 we find 

LOfig = —0.05692 1406, hb, p6fig = 0.00047 4345, 

μδϑῇις = 0.00040 8458, hbou6°f,¢ = 0.00000 0624, 

μδῦῇιε = —0.00000 3327, hb38°f,, = 0.00000 0001, 


ylelding the more accurate value 


I 


1.8 
| Jo(x) dx = 1.28982 0973, 
0 
correct to the number of places given. 


Problems 


9. Evaluate 


with an error less than 1078, 


10. Devise a method, similar to the one discussed above, based on the formula 
obtained in problem 4. (Divide into an even number of subintervals.) 

11. Using the integration formula (13-6) and a similar formula involving 
forward differences, obtain an end correction for the trapezoidal rule that 
involves values ἢ, satisfying Ὁ = ἡ Ξ N only. 


numerical integration 259 
12. Show that if the function fis periodic with period 7, and if formula (13-19) 
is used to evaluate the integral of f over a full period, then 
Cy? = 0 for all αὶ and N. 
13. Problem 12 shows that for the integration of a sufficiently differentiable 
periodic function over a full period, 


| ” f(x) dx ~ Ty = OH") 


for every k. Does it follow that the trapezoidal value is exact? 
14. Show that the trapezoidal rule Ty yields for N > 1 the exact values of the 


integrals 
TE an : 
i cos x dx, Ϊ sin x dx. 
0 ῃ 
(Use problem 19, chapter 6.) 


13.7 Romberg Integration 


Theorem 13.6 states, in effect, that the trapezoidal approximation to 

| é f(x) dx calculated with the step A, 
A(h) = 7 Nth)s 
where N(h) = (6 — a)h~’, satisfies 
A(h) = do — Ci, + 0(0?**?), 

where a, is the exact value of the integral, and where C¥¥}, is defined by 
(13-20). If we could show that for every positive integer k 
(13-22) ΟἸ = a,h? τ (οὐ ἢ rs es a,h?" + σι. Ὲ 2, 


then A(h) would satisfy the hypotheses for successful Afold extrapolation 
to the limit, that is, it would be of the form (12-19) where y = h?. We 
might expect then to speed up the convergence of the trapezoidal values 
by an application of algorithm 12.4. 

By expanding the differences appearing in (13-20) in powers of A, it is 
easy to see that C{*}, indeed can be expanded in the form (13-22). For 
instance, 


I 


ἡμδῇς = 5 fi -- " 
= Hf(xe) + ἘΠ ὺ τ". 


bpd = 5 fa — 2. + Ya - ὦ 
= h8f"(xo) ++ >>. 


260 elements of numerical analysis 


We thus may apply algorithm 12.4, keeping in mind that y = h?. The 
choices of the values y, and r are dictated by the obvious procedure to 
begin with a single subinterval (N = 1), and then to double the number 
of subintervals at each step. Since y = A? this means that the ratio 
between consecutive values of y is r = 4. The algorithm thus results in 
the triangular array 


Ao,o 
Ay,o Ax1 
Az, Ag,1 Ag, 2 


A3,0 43} 43,. Az,3 


defined by the recurrence relations 
(13-23) Ano = To, Ἢ ἘΞῚΝ ee ee 


+1 
4" ee os Am-1.n. 


(13-24) Ann+i = qrtt | 


In order to define an economical algorithm, we note that of the ordinates 
necessary to compute 7," only those whose index is odd have to be freshly 
calculated. The others are known from 7,5-- τ. The computation is 
most easily arranged by introducing besides the trapezoidal values 7'y the 
midpoint values 


My = A(fij2 + faa +++ + + ὖν - τα) 
=h> ματα - Ὁ) (r=*54) 
π᾿ Ω - (ἢ -- 4)h ae 
Like the trapezoidal value, the midpoint value is an approximation to the 
desired integral that is in error by 0(h?). We clearly have 


Ton = me Of + fia thie t+ fea + νοι + thi) 


= +(Ty + My). 
Thus in the present situation algorithm 12.4 can be formulated as follows: 


Algorithm 13.7 If the function f is defined on the interval [a, δ], 
generate the triangular array of numbers A,,, by means of the 
recurrence relation (13-24) and 


Ao.o = Τί = 10 — aI fla) + SO). 


Aso = Man =h > flat(n— Bi), b= 2-"Ὁ -- ἃ), 


Am+1.0 = #(An 0 + Ano) in = Q, ei aus 


numerical integration 261 


The hypotheses of theorem 12.4 are satisfied when / satisfies the hypoth- 
eses of theorem 13.6 for every k when N is sufficiently large. ‘This is the 
case if f has derivatives of all orders on an interval containing the interval 
[α, δ] in its interior. This condition, in turn, is satisfied if f 1s analytic 
on the closed interval [a, b] (see Buck [1956], p. 78). 

Theorem 13.7 Let the function f be analytic on the closed interval 

[a,b]. Then all columns of the array generated by algorithm 13.7 
[70 ) dx. If none of the coefficients a, in (13-22) 

vanishes, each column converges faster than the preceding one. 


converge to 


The scheme generated by algorithm 13.7 is commonly known as the 
Romberg scheme (see Bauer εἰ al. [1963]). 


EXAMPLE 
3. For the problem considered in example 2 the Romberg scheme 15 as 
follows: 


Table 13.7 


1.16432 1734 

1.25919 0749 1.29081 3754 

1.28220 7763 1.28988 010] 1.28981 7857 

1.28792 0410 1.28982 4626 1.28982 0927 1.28982 0976 
1.28934 6003 1.28982 1201 1.28982 0973 1.28982 0973 


The accuracy of extrapolation to the limit with 16 steps is about the same 
as applying Τὶς with “‘end correction.”” The advantage of the Romberg 
procedure is that no decision has to be made in advance concerning the 
best stepsize. 

Problem 13 shows that it may happen that some or even all coefficients 
in the expansion (13-22) are zero. In such cases extrapolation to the 
limit will not speed up the convergence of the trapezoidal rule. 


Problems 


15. Verify experimentally that extrapolation to the limit does not speed up 
the convergence of the trapezoidal rule for the integral 


if Vx dx. 


Can you explain why? 
16. Evaluate 


with an error of less than 10 - 8, using repeated extrapolation to the limit. 


ha 


262 elements of numerical analysis 


17. Show that the values Am,1 obtained by algorithm 13.7 are identical with 
the values obtained by applying Simpson’s rule (see problem 4) τὸ 25} 
subintervals. 

18. Express the values Am,2 in terms of the ordinates f,. Can you recognize a 

familiar integration rule? 


Recommended Reading 

Hildebrand [1956] gives an excellent account of Gaussian quadrature 
in chapter 8. Many examples of the use of generating functions to obtain 
integration coefficients are given in Henrici [1962], chapter 3; ‘Bauer et al. 
[1963] contains a definite treatment of Romberg integration. Milne 
[1949] gives numerous integration formulas not considered here and also 
has a general treatment of the error in numerical integration. 


Research Problems 

1. Make a comparative study of the effectiveness of the end correction 
versus repeated extrapolation to the limit. | 
2. Study the connection of the end correction with the Euler-MacLaurin 
summation formula. 


chapter 14 numerical solution of differential 
equations 


The mathematical formulation of many problems in science and engineer- 
ing leads to a relation between the values of an unknown function and the 
value of one or several of its derivatives at the same point. Such relations 
are called differential equations. The present chapter is devoted to the 
problem of finding numerical values of solutions of differential equations. 


14.1 Theoretical Preliminaries 


Let f = f(x, y) be a real-valued function of two real variables defined 


fora < x Ξ δ, where a and ὃ are finite, and for all real values of y. The 


equation 


(14-1) γ᾽ = f(%, y) 


is called an ordinary differential equation of the first order; it symbolizes the 
following problem: To find a function y = y(x), continuous and differenti- 
able for x € [a, b], such that 


»Ο) = SO; VOX) 


for all x € [a,b]. A function y with this property is called a solution of 
the differential equation (14-1). 


EXAMPLE 


1. The differential equation γ' = —y has the solutions (x) = Ce~*, 


C= const. 


As the above example shows, a given differential equation may have 
many solutions. In order to pin down a solution, one must specify its 
value at one point, say at. x = a. The problem of finding a solution y of 


263 


numerical solution of differential equations 265 
264 elements of numerical analysis 


Condition (14-3) resembles condition (4-4) imposed on functions 
suitable for iteration, with the difference that the function f now depends 
on the additional parameter x. The condition is again called a Lipschitz 


| (14-1) such that y(@) = s, where s is a given number, is called an initial 
Ι value problem and is schematically described by the equations 


| aaa, { γ' =f(x, γ), condition. 
y(a) = 5. | 
| EXAMPLE eens 
| 2. The initial value problem γ' = —y, γ(0) =1 has the solution 1. Which of the following functions / satisfy a Lipschitz condition? 
yx) = ε΄ *, 
(a) IY) =F +p 
| equations whose solutions can be expressed in terms of elementary x? 
Ϊ functions, or of indefinite integrals of such functions. It is shown, for (0) Ix, y) = 1+”? Osxsl; 
| instance, that any differential equation of the form γ΄ = g(x)p + p(x) can (c) f(x, νὴ) = y*: 
| be solved in this manner. The emphasis on explicitly solvable differential Αἰω 
Ι equations tends to create the impression that almost any differential (d) F(x, y) = Vy. 


equation can be solved in explicit form, if only the proper trick is found. 
ἡ Nothing could be farther from the truth, however. Most differential 
. equations that occur in actual practice are much too complicated to admit 
i an explicit solution; if numerical values of the solution are desired, they 
Ϊ 


2. Show that the following condition is sufficient for f to satisfy (14-3): The 
partial derivative ἢ, exists and is continuous fora ΞΞ x < b, -- ὦ < y< ow, 
and there exists a constant M such that 


lfulx, y)| ΞΜ 


for all values of x and γ in the indicated domain. 


In courses on differential equations much stress is placed on differential 
can only be found by special numerical methods. 
| | 


| EXAMPLE 
iit 3. The differential equation y’ = 1—y can be solved explicitly; for the 


equation py’ = |—y + ey* sin x, which for small values of e differs only 
iI little from it, no such solution can be found. 


14.2 Numerical Integration by Taylor’s Expansion 


Throughout the balance of the present chapter we shall assume that the 
function f not only satisfies the conditions of the basic existence theorem 
14.1, but also that it possesses continuous derivatives with respect to both 
x and y of all orders required to justify the analytical operations to be 
performed. We shall refer to this property by the phrase “‘/ is sufficiently 
differentiable.” 

It is a basic fact in the theory of ordinary differential equations that if f 
is sufficiently differentiable, all derivatives of a solution of the differential 
equation (14-1) are expressible in terms of the function fand its derivatives. 
In fact, this statement is obviously true for the first derivative. Assuming 
its truth for the nth derivative, we write 


(14-4) ee = FO Bs 


Where {τ Ὁ, the (a — 1)st total derivative of f with respect to x, is a 
certain combination of derivatives of f. Differentiating the identity 


γ»"ὺ) = f(x, YX) 
Once more with respect to x, we have 
»π 0) = f°, VOY) + F"~ PO, γοῦν» Ὁ) 


| The fact that a differential equation does not possess an explicit solution 
i does not mean that the solution does not exist, in the mathematical sense. 
Ϊ The following existence theorem is proved in courses on differential 
| equations (see Coddington [1961], p. 217). 


| \] Theorem 14.1 Let the function f= f(x,y) be continuous for 
" ΩΠ ΞΧΞῥ, --οὦ < γ < οὐ, and let there exist a constant Γ, such that, 
i for any two numbers y, z and all x € [a, d], 


(14-3) fs ») — £0, 2)| Ξ Ly -- 2]: 
| | 


Then, whatever the initial value s, the initial value problem (14-2) 
possesses a unique solution y = p(x) for xe [a, ὁ]. 


approximations. Although this is, in a sense, a constructive method for 
proving the existence of a solution, the method 15 not suitable for the 
numerical computation of the solution, because it requires the evaluation 
} of infinitely many indefinite integrals. The methods to be given below 
| are far more economical and are therefore preferred in practice. 


This theorem can be proved, for instance, by the method of successive 


266 elements of numerical analysis 
and thus, in view of y'(x) = f(x, y(x)), 


yer ix) = f(x, γον)», 
where 
(14-5) FO, Y) ΞΡ ΟΝ 
We thus have 


Algorithm 14.2 If fis sufficiently differentiable, the mth derivative of a 
solution y of (14-1) is expressible in the form 


Y(x) = FP" M(x, W(X), π-0,1,2,... 


where the functions f“” can be calculated by means of the recurrence 
relation 


(=f, fT f— 4 F—f, =m =0,1,2,.... 
In place of [Ὁ and f‘” we shall also write f’ and /". 


EXAMPLES 


4, Sf =Se Ὁ 
7 = fee του + ως + Sev τυ + FS 
Ξε faze Ῥ Shalt Thies” 23 (fx + Saf My: 


5. Let f(x,y) = x7 + γ. We find 


I(x, Y) = 2x + 2γί(χ" + y"), 
IU Y) = 2 + 4χν + (2χ' + OY OX Ὁ} 


In view of the fact that the derivatives of the solution of a differential 
equation can be determined whenever the derivatives of f can be calculated 
(which is, in principle, always the case if fis an elementary function of x 
and y), one might think of approximating the solution y of the initial 
value problem (14-2) by its Taylor series at s = a. In view of the initial 
condition y(a) = s this series takes the form 


(14-6) p(x) = s + “fa, ) + = f'(@s) 


l 
_— mye 
+ SAD prea, ὁ τ’... 


However, the above examples suggest that unless the function fis very 
simple, the higher total derivatives of f rapidly increase in complexity. 
This circumstance makes it necessary to truncate the infinite series (14-6) 
already after very few terms. This necessarily restricts the range of values 
of x over which the truncated series (14-6) can be expected to define a 


ΝΕ ΞΕ ΞΞΞΕΞ ες 


numerical solution of differential equations 267 


good approximation of the solution y. One therefore will use the trun- 
cated series with a small value of x — a,x — a = hsay, and re-evaluate the 
derivatives f, f’, f’,..., at the point x = a + ἢ. 
Problems 

3. Calculate the nth total derivative of / if 

(a) f(x,y) = y + e*, (δ) f(x,y) = εὖ. 
4. If f(x, }) = (2y/x) — 1, show that f” is identically zero. What is the 
explanation of this fact? 

14.3. The Taylor Algorithm 


In order to formalize the procedure outlined in the preceding section we 
introduce the following notation. Let ἢ > 0 be a constant, and set 


xX, = ἃ Ὁ πῆ, pes te ἢ, 


We shall always denote by y, a number intended to approximate y(x,), 
the value of the exact solution of the initial value problem (14-1) at 
xX =x, If the method of Taylor expansion is used, these values are 
calculated according to the following scheme: Let, for some fixed integer 
pz i, 


Ι Be ςς ΤΑΣ 
(147) Τρία, 95H) = F/O 9) + 5.0.2} #00 + SE SOM »}) 
The numbers y, are then calculated successively as follows: 


Algorithm 14.3 For a fixed value of ἢ, generate the sequence of 
numbers {y,} by the recurrence relation 


(14-8a) yo = δ, 
(14-8b) Fae =). Ἢ HE fh Xin Κα: ἢ). =O, 1, 25060 
This scheme is known as the Taylor algorithm of order p. 


EXAMPLE 


6. The Taylor algorithm of order one is particularly simple. In view of 
L(x, y; h) = f(x, y), (14-8b) then becomes 


(14-9) » κει = Vn Ἔ hf (Xns Yn) 


This is also known as the Euler method, or as the Euler-Cauchy method. 


268 elements of numerical analysis 


Naturally we expect the values defined by (14-8) to approximate the 


corresponding values }(x,) of the exact solution, particularly when / is 


small. To see whether this is the case, let us examine the purely academic 
example 


(14-10) yO=1, yy. 
The exact solution, of course, is y(v) = εὖ, In view of 
dik, y) = it, y) = [Ὁ, y) to eee. 


the function Τ᾽, takes the simple form 
Ley (ὦ he? 
Tx, 34) = [ + ia —)». 


The recurrence relation (14-8) thus becomes 


ke h? 
Juni = (I tty tt a) ye 


0.25 0.50 0.75 ~~ 1.00 


Figure 14.3 


numerical solution of differential equations 269 


This is a difference equation of order 1 (see chapter 3); the solution 
satisfying the initial condition yp = 1 is given by 

| ae hes 
(14-11) 7, = (1 tata +++ 5) 2 (A ES So 
We wish to examine how closely y, approximates »(x,) = e*» at a fixed 
point x of the interval of integration, if the approximation y,, is calculated 
by successively smaller integration steps 4. We thus must let ἢ -- Ὁ and 
n—» oo simultaneously in such a manner that x = x, = nh remains fixed. 
By Taylor’s formula with remainder term (see Taylor [1959], p. 471) 


h he hh? fer 
 — pansies ee ae wes ioe als ey 
oP Ete ager ea 


where @ is some unspecified number between 0 and 1. We now use the 


fact that for any two real numbers A and B such that A = B = 0 and for 
> 8 ae 


ΓΝ 


A® — Br = (A ὙΠ B)(A"-} τ AOE Ak Ae . Br) 
< (A — B)nd"-?, 


In particular, taking 


—= οἷ, 
h? AP 
B=1 + ΤΙ + 5] τῇ: + pr 
we obtain 
P ἢ ἐλ" 
V(X) a ΕΞ et — ( + 1! ἘΠῚ ΠΗ 
< area: eel elt τ 1h 
Or, in view of nh = x,,0 < @ < 1, 
hP 
14- -- ἐπεισ τος τ Ξε πεν αὶ ar 
(14-12) lyn — γίχω! S τα Dl *? 


This relation shows that, at a fixed point x, = x, the error in the approxi- 
mation of the exact solution y by the Taylor algorithm of order p tends to 
zero like A” as the stepsize ἢ tends to zero. 

The convergence if the integration steps are successively halved is 
illustrated in figure 14.3. 

We shall now derive a similar bound for the error y, — y(x,) of the 
Taylor approximation y, to the solution y of an arbitrary initial value 
problem (14-2). To this end we make the following assumptions: 


270 elements of numerical analysis 

(i) Not only the function /, but also the function Τ᾽, defined by (14-7) 
satisfies a Lipschitz condition with Lipschitz constant L: 
(14-13) IT (x, 93h) — Τα, 25 1] 5 Ly -- 2| 


for any y, z and all x and ἢ such that x Ε [a, b], x + he [a, 6]; 

(ii) The (p + 1)st derivative of the exact solution ν of (14-2) is con- 
tinuous and hence bounded on the closed interval [a, b], and 
(14-14) [yP**(x)| ἔρον. x(a, 8]. 


We then have: 


Theorem 14.3 Under the above assumptions (i) and (ii), the error 
Zn = Vn — V(X,) Of the Taylor algorithm of order p is bounded as 
follows: . 

ΤΌΝ oa a ᾿ 
(p + 1)! L 


The result shows that z, again tends to zero at least like A” as x, = x Is 
fixed and A — 0. 


(14-15) [3 = 2 


Proof. By Taylor’s formula with remainder, the exact solution satisfies 


VXn+1) = V%n ΤᾺ h) 
fet 


ἘΞ γίχ) τ AT (Xn, γίχω: h) + (p + 1)! y* (Ends 


where é, is some point between x, and x,,,. Subtracting the last relation 
from (14-8b), we get 
(14-16) Ζ2,..1 = Zn + ALT (Xa, Vas ἢ) — Tons VO%n); 1] 
jyert re 
and hence 
[Zn +1 = ΙΖ + h|T (Xn Vn; ἢ) = J tas VW(Xn): h)| 
ES aoe | pP*1E,)), 
(p + 1)! eae 
Using (14-13) and (14-14), the expression on the right can be simplified, 
with the following result: 
| Vive 
(14-17) ΙΖ ει] S [en] + AL|zq] + 2P**  Ὴτ 


We now make use of the following auxiliary result: 


numerical solution of differential equations 271 


Lemma 14.3, Let the elements of a sequence {w,} satisfy the 
inequalities | 


(14-18) κα Ξ (1 + aw, + B, n= 0,1,2,... 


where a, and B are certain positive constants, and let Wo = 


Then 


ene 
(14-19) WS Bo, n= 0,1,2,.... 


Proof of lemma 14.3. The relation (14-19) is evidently true for n = 0. 
Assuming its truth for some nonnegative integer n we have from (14-18) 
in view οὗ] + a Ξ ρα, 


eu 


Lee 2 


lA 


Wa+i = (1 3 a) 


2 

a pl Ἔ a)je" ἢ - πὸ Ἐπ L 
a 4: a 

establishing (14-19) with ἢ increased by one. The truth of the lemma now 

follows by the principle of induction. 

Returning to the proof of theorem 14.3, we apply the lemma to the 
relation (14-17), setting w, = |z,|, ἃ Ξξ ἃ, B= pet} Yneil(p + 1)! 
By virtue of (14-8a) and (14-2), [20] = 0. Thus the conclusion (14-19) 
applies, yielding (14-15). 


Problems 
5. Solve the initial value problem 


Pola. WO =e, 


by the Taylor algorithm of order p = 2, using the steps A = 0.5, A = 0.25, 
and ἡ = 0.125, and compare the values of the numerical solution at 
Xn = 1 with the values of the exact solution. 

6. Find an analytical expression for the values yn defined by (14-9) for the 
initial value problem 

ak ΣΙ 2y | 
and verify the statement of theorem 14.3. [Hint: Use theorem 3.3 to 
solve the difference equation involved. ] 


14.4 Extrapolation to the Limit 


The estimate for the error z, = y, — y(x,) given in theorem 14.3 should 


not be regarded as a realistic indication of the actual size of the error. It: 


1 elements of numerical analysis numerical solution of differential equations 273 


merely serves to prove the convergence of the Taylor algorithm ‘ee i 
indicate the order of magnitude of the error. The estimate states, In 
effect, that there exists a constant K so that 
(14-20) \z,| = Kh? 
for all x, ¢[a, 6]. This relation shows that the error tends to zero as 
μ-»-Ὁὶ Cat we make a statement about the manner in which the ee 
tends to zero? As we have observed repeatedly, such knowledge ws ; ‘ 
helpful for the purpose of speeding up the convergence of the metho 
h —> 0. 

We shall prove: 


Since wo = 0, lemma 14.3 now yields the relation w, = O(h?* 1) equivalent 
to (14-21). 

In order to make explicit the dependence of the numerical values yn On 
the step A with which they are calculated, we shall denote (in this section 
only) by y(x;A) the approximation to p(x) calculated with the step ἢ. 
Relation (14-21) can then be written more explicitly as follows: 


(14-23) Wx, A) = γι) + hP2(x) + O(h? +1), 


In general, both quantities y(X,) and z(x,) are unknown in this relation, 
the latter because it depends on the solution of a differential equation 
involving the unknown function y. However, the mere fact that a 
relation of the form (14-21) holds can be made the basis for an extra- 
polation to the limit procedure. Eliminating z(x,) between (14-23) and 
the similar relation 


Theorem 14.4 If / is sufficiently differentiable, the errors z, = ), 
— )(x,) of the values defined by (14-8) satisfy 
(14-21) Ζ, = APAX,) + OP”), [ 


Y(x, γ 11) = y(x) + r-PhP2(x) + OfAP τὴ 
where z denotes the solution of the initial value problem for the numerical approximation calculated with the step r-1h, where 
r ~ 1, we find that the quantity 


(14-22) 


z(a) = 0, | 
if web | i Se aa lta γε 
ἀπ να (14-24) yx Hy = "05 ἢ = reve 14) 


— pr 


Proof. We start from relation (14-16) above. By Taylor’s theorem, differs from y(x) by only O(h?+2), As h—> 0, the values y*(x, A) will thus 


converge to y(x) faster than the values y(x,h). Here r is, in principle, 
arbitrary; for practical reasons one usually chooses r = 4, 
Under the assumption that (14-23) holds in the generalized form 


(14-25) YOE5 A) = YX) + ag(x)h? + ayy (x)hPt) +... 
+ a(x)h® + O(h**) 


for every k > p, we can speed up the values y*(x, A) once more, using 

(14-24) with p replaced by p + 1. Continuing the process systematically | 

pred (ἀρ τ Ι in the manner of algorithm 12.4, we obtain | 
ea ores Be Ee iste ey Sd | 

Zn41 = Zn ὁ Mfy(%Xns VO%Xn))Zn (p + 1)! y (%n) | Algorithm 14.4 For a fixed x in [a, δ], form the triangular array of 

numbers A,, , as follows: | 


aT ἀλλ gait ν 
T (ns Yn ἢ) — Tons Y%n)3 Μὴ = Fy Cons PEER): Π)Ζ, + OC) 
= Fe ιν yon); O)zn + O28) + (hz) 
ἐν 


= fin Paes τ ΠΡ 


since T,(x,y;0) = f(x,y). Also, y®*(é,) = y°*P(x,) + O(). We 
thus have 


= SSS τὸ = 


From this we subtract ἢ times the relation 


. A~—- ἢ 
3 ee = 2(X;) - hz'(X,) om O(h?) An,o = v(x; >) n= 0, l, ae raat 


| I (p+1 | - - | 
μς ἀχὸ bs hf, γι, τα) ἘΞ τ τ᾿ | + Oh ) | 


ΤΑ͂Ν Γ᾿ Amn ts 
QPta 1 : 


The fact that (14-25) holds for sufficiently differentiable functions f was 


Ag. 435 = 


n=0,1,....m— 1. 
which follows from the definition of z. Setting 
fe Ny) 


W, = Zn, — h?z(X,) 
we obtain 


|Waai| S [wal + AL|w,| + Of? *?). 


established by Gragg [1963]. If this is taken for granted, we can deduce 
as in theorem 12.4 that each succeeding column of the array An,» converges — 
faster to p(x). 


274 elements of numerical analysis 

EXAMPLE 

7. We integrate the initial value problem 
y=x-y’, γίθ) Ξ ὃ 


by Euler’s method, using the steps ἢ = 0.4, 0.2, 0.1, 0.05. The table 
below shows the resulting values at x = 0.8, and the values obtained by 
repeated extrapolation to the limit. 


Table 14.4 | 
h Am,o Am,1 Am, 2 Ama = Aim, ἃ 

0.8 0.00000 

0.4 0.16000 0.32000 

0.2 0.23682 0.31363 0.31151 

0.1 0.27201 0.30720 0.30505 0.30413 

0.05 0.28871 0.30541 0.30481 0.30478 0.30482 
Problems 


7. Let the initial value problem 


3 πε is oh ΨΙ͂ΒΕΞ 


be solved by the Taylor algorithm of order 1. Where does the absolute 
value of the error (approximately) attain its maximum value? Verify 
your result numerically by performing a numerical integration using the 
step ἢ = 0.1. 
8. Improve the values obtained in problem 5 by extrapolation to the limit. 
9. Devise a method for extrapolation to the limit if p, the order of the 
method, is not known. 


14.5 Methods of Runge-Kutta Type 


The practical application of the Taylor algorithm described in the 
preceding sections is frequently tedious because of the necessity of evaluat- 
ing the functions f’, f”,... at each integration step. It therefore is a 
remarkable fact that formulas exist that produce values y, of the same 
accuracy as some of the Taylor algorithms of order p = 2, but without 
requiring the evaluation of any derivatives. We mention only a few of 
the many available formulas of this type. 

(i) The simplified Runge-Kutta method. Here we replace the function 
Το in algorithm 14.3 by K,, where 


(14-26) K(x, γ; h) = 3Lf(x. y) + fe + A, y + Af, }}}}. 


᾿ 
Ὄπ  Ξ- -Ἅ 


numerical solution of differential equations 275 


Clearly, no evaluation of f’ is necessary. Instead, the function f is 
evaluated twice at each step. It can be shown that 
(14-27) K(x, y; h) — Τρία, y; ἢ) = 0(A°), 


and that the accumulated error satisfies an inequality of the form (14-20), 
and, with a suitable definition of z, a relation of the form of (14-21), where 
p= 2. 


(1) The classical Runge-Kutta method. Here the function Τὶ, in 


algorithm 14.3 is replaced by Ky, a sum of four values of the function /, 
defined as follows: 


(14-28) K(x, y; A) = 3[k, + 21. + kg + ka], 
where 


ky = f(x, y), 
Κι = (x + ay + 51} 


| h ΜΈΝ 
Kes = s(x Ἔ πὴ + 5 ka): 
kg = [ + h,y + hk). 
It can be shown that 
Ka(x, γ; A) = T(x, y; h) +  Ο(ἢ 


and that, as a consequence, the relations (14-20) and (14-21) now hold with 
p =4. (The proofs of these statements are extremely complicated). In 
view of this fact, the classical Runge-Kutta method is one of the most 
widely used methods for the numerical integration of differential equations. 

(iii) A mixed Runge-Kutta-Taylor method. If the first total derivative 
of f is easily evaluated, we may use in place of 7; in algorithm 14.3 the 
function G3(x, y; 4) defined by 


(14-2. Ga(x, »; A) = f(x, y) + 5 f(x = 7 i τύ, ») 


It can be shown that Οᾳίχ, y; 4) — T3(x, y: h) = O(h8) and that, as ἃ 
consequence, (14-20) and (14-21) are true for p = 3. 


Problems 


10. Solve the initial value problem 


yp=x-y*, y0)=0 
for Ὁ = x = 0.8 by the classical Runge-Kutta method, using the step 


.l. Also, perform the same integration with the steps # = 0.2 aad 
.4, and perform an extrapolation to the limit, 


oo 


276 elements of numerical analysis 


11. Establish the relation (14-27). 

12. Show that for the differential equation y’ = Ay (A = const.) the values 
¥n produced by the methods Kez and K, are identical, respectively, with the 
values produced by Τῷ and Τ᾽. 

13. Prove that Gy differs from ΤΩ by O(/°). 


14.6 Methods Based on Numerical Integration: The Adams-Bashforth Method 


All the methods discussed so far are based directly or indirectly on the 
idea of expanding the exact solution in a Taylor series. A different 
approach to the problem can be based on the idea of numerical integration. 
A solution y of (14-1) by definition satisfies the identity 


y'(x) = f(x; γ0 7). 


Integrating between the limits x, and x,,, we obtain 
(14-30) ἕο, εὐ = WO%q) + | “ἡ κα, y(x)) ἀκ. 


Let us now suppose that we have somehow already obtained approximate 
values Vo, ¥1,;---; ¥, Of the solution y at the points xo, x,,...,%,, where 
we again assume the points x,, to be equally spaced, x,, = a+ mh. The 
approximate values of f(x, y(x)) at these points then are 


In Ξ Ce Pa) m = Q, | aa sae = 


lf k = n, we can use these values to approximate f(x, »(x)) by the Newton 
backward polynomial 
ί -- X — Xn 


Pi (x) = > (Ὁ ap ον Say 


m=0 


Performing the integration in (14-30) on P,, in place of f, we obtain the 
following algorithm for computing a new value y, 41: 


Algorithm 14.6 (The Adams-Bashforth method.) For a given 
function f = f(x, y), a given constant ἢ > 0, a given integer k = 0, 
and given values Yo, Vay «+65 Vio let fm = Fs Pn) (τ = 0, 1,2... Ὁ. 
and generate the sequence {y,} recursively from the formulas 


(14-31) Xn+1 = Xn + h, 
παι = Yn Ἐ hlcofn 1 OVhn i ον τρί al CV "fal, 


and 


Ἔα = f(Xn+1. Vat) ἘΞ ἌΓ αι ἦν 
ὙΠ ΕΞ ΡῈ ἐῶ Vente. p= Loe. Ἐὶ ΣΙ 


wWheren τὸ k,k + 1,.... 


numerical solution of differential equations 277 


Here the coefficients c,, are defined by (13-5); for numerical values see 
table 13.4b. The working of algorithm 14.6 is illustrated by scheme 14.6 
fork = 3, 


Xo ¥o So 
Whi 
Xy v1 fi V*fo 
Vio V3f; 
ΧΩ ye Se 
v3 Va Is 


> ἢ 
Rea Scheme 14.6 


AS soon as Y,+1 is known, f,,1 can be calculated and a new diagonal of 
differences may be formed. The process requires exactly one evaluation 
of f per step, no matter how many differences are carried. This com pares 
favorably with methods such as the classical Runge-Kutta method, which 
requires four evaluations. 

Algorithm 14.6 does not say how to obtain the starting values yo, }1». «ον 
y;. This can be done by any of the methods discussed in $14.3 or 
§14.5; other starting methods, more in the spirit of the algorithm itself, 
are also known (see Collatz [1960], p. 81). 

Let us now study the error z, = y, — y(x,) of the values defined by 
(14-31) under the assumption that the starting values are in error by at 
most ὃ: 


(14-32) [ζω = |¥m — Y%n)| < 8, m=0,1,...,k. 
Expressing differences in terms of ordinates, (14-31) appears in the form 


Ynt+1 = Vn aye h{biotn τις Dy fn—4 etal Di Sn— hs 


where the 4,.,, are certain constants, involving the c¢,,, that need not be 
specified. The corresponding relation for the exact solution is, by (13-6), 


WXn+1) = γι) + h{Dieo ¥' (Xn) + Dir Y'(Xn-1) H+ ++ bic Y (Xn-1)} 
Tepes Oe lee. 


where €, is a point between x,_, and x,,;. We now subtract the last 
two relations from each other. Observing that 


Lin ni Y' (Xn) a Lf Cbs Vm) — Fins yXmn))| 
= Lm -- γα! = LZnl; 


278 elements of numerical analysis 


where 1, denotes the Lipschitz constant of the function /, and assuming 
that 


(14-33) 


we obtain 


er 0) Ξ Vesa, 


ΠΛ 


b 


a=x 


l2n+21 S [Zn] + ALL [Prof l2nl + [Dial [Zeal Ἔ τ... + [Bil {Zax} 
τς ἘΝ | aad) Seer 
Our aim is to find an explicit bound for the quantities |z,|.. An induction 
argument shows that 
ase i ΘΕ λιςς, 
where {w’,} is any solution of the difference equation 
(14-34) Wa+1 ΞΞ Wa ἘΣ ἈΠ {ἸΡκο]ννη Ἵ Dict νι ἘΠῚ ΠΟΤ ΤΙΣ [δεν Wn nh 
ἘΝ κε ἘΚ ὁ 

satisfying 
(14-35) 

We try to find such a solution using the principles set forth in §6.7. 
Since the nonhomogeneous term in (14-34) is a constant, the equation 


᾿Ξ Oy HP πὸ, 1,....0.. ἃ 


clearly has the particular solution w, = —C, where 

ie lei /A*** Yaa 
(14-36) C= pages age 
(14-37) B,. tr [Bio -- [δ κι] + ΠΡΟΣ δι + [Drexel 


In order to obtain a solution satisfying (14-35), we add to this solution a 
suitable solution of the homogeneous equation 


Wier = Wy + ALL |Dyo|Wa + [Dar|Wa-a ἘΠ. + [διέ νϑα- κ᾿. 
The characteristic polynomial of the equation is 
p(z) = 2*** — ἰῇ — WL δι, ΖΡ +--+ + |Drcc| }. 


Evidently, p(1) Ξ 0. On the other hand, if z = 1 + ALB,, where B,, is 
given by (14-37), then 

Ι + ALB, iar Ϊ at hL{ |b;.9| Sie = [Dail Z~ *} 

ALB, — AL{|Byo| + +--+ |Drx|} 

Ξι ἢ 


It follows that for some z* such that 1 = z* = 1 + ALB, 


z~ “p(z) 


IV ll 


p(z*) = 0. 


a δ. e < 


numerical solution of differential equations 279 
Thus a solution of (14-34) satisfying (14-35) is given by 


W, = (δ + C)z** — C. 
In view of 


ΖΞ (1 + ALB,)" S *n- OLB, 
we finally find, remembering the definition of C, 


etn τα ΕΒ. es l 


14-38 
(14-38) ΤΕ. 


Z| < bel Tn τ 18. = ΤΕ 2 ae 


te Χο Sub. 


3 


In summary, we have 


Theorem 14.6 If the starting values of the Adams-Bashforth method 
are in error by at most δ, then the errors z, = Yn — ¥(X,) are bounded 
by (14-38), where Y,,.. and B, are defined by (14-33) and (14-37). 


As on earlier occasions, the estimate (14-38) should be looked at as a 
qualitative rather than a quantitative statement. The important thing is 
that, if the starting errors are sufficiently small (namely at most O(A***)), 
the error of the Adams-Bashforth algorithm tends to zero like μετ} as 
ἢ - Ὁ. Interpolating f in (14-30) by an interpolating polynomial of 
degree k has the same effect as expanding the solution in a Taylor poly- 
nomial of degree k + 1. However, the coefficient of Y,,.2 in (14-38) is 
markedly greater than the corresponding coefficient of Y,+1 in (14-15), 
on account of the fact that both 


and 8, > 1] 


l 
lCicza| > (k +2)! 


fork > 0. 


Problems 


14*, Assuming that the starting values are exact, show that the errors z, of the 
Adams-Bashforth method satisfy 


Zn = 2(x,)h*** + O(h**?), 
where z denotes the solution of the initial value problem 
z = f(x, V(x))zZ — Cer y**?(x), 
z(a) = 0. 


[Analog of theorem 14.4.] 

15. Solve problem 10 by the Adams-Bashforth method with k = 2, deter- 
mining the starting values y_, and y, by Runge-Kutta or by a series 
expansion around x = 0. Use the steps A = 0.2 and A = 0.1 and 
extrapolate to the limit at x = 0.8. 


280 elements of numerical analysis 
14.7. Methods Based on Numerical Integration: The Adams-Moulton Method 


A considerable refinement and improvement of the Adams-Bashforth 
method can be obtained as follows: Suppose a tentative value of γ,..1 
has been obtained, for instance by the Adams-Bashforth method. We shall 
call this value γι)... Setting AS%, = [{(χ,.... Vrii), we can construct the 
polynomial which at the points Χο. κ» Xn—e41)++-+s Xn» Xn+1 takes on the 
values ἔς... [πο κεν. οἶα. According to the Newton backward 
formula, this polynomial is given by 


Pyaa(x) = ΣΟ Vs ea ae 


where again s = (x — x,)/h, and where the differences are formed with 
the values /f,,f,,..-,/n-z. Integrating between x, and xX,41 we 
obtain a new value of γα... to be called γα), according to the formula 


: a ἘΞ hese, + eT Vines + CPV ava too 
(14-39) Vasi = Yn + (οὗ +1 1 ipo 3 +1 ck VEHPO, 


Here the coefficients c* are defined by (13-8) and are tabulated in table 
13.4a. 

In general, the value γί), calculated by (14-39) will disagree with 
γνῶ, We therefore compute ff? = f(Xn+1, Yn +1), correct the difference 
table of the values of f, and reevaluate the term on the right in (14-39). A 
new value y@), will result. The general step of this iteration procedure is 
described by the formula 


(14-40) γι = yn + Afed Ants + οὗ αι tee + ἔα VE Sta} 


the differences on the right being formed with {ἢ = (χη... ¥?.1), 
fas» ++sfn-% Theoretically, the iteration with respect to i should be 
continued indefinitely, and the limit as i-> οὐ of the sequence y{’., would 
be accepted as the final value of y,,,. In practice, of course, the iteration 
continues only to the point where the sequence y{., has converged 
numerically in the sense that an inequality 


ΠΡ — Yaral <e 


holds, where « is some preassigned number. More frequently even, the 
iteration is performed only a fixed number of times, say fori = 0, 1, 2, ὌΝ 
I, where 115 a preassigned integer such as 2 or 3. We state the last version 
of the procedure as a formalized algorithm. 


Algorithm 14.7. (The Adams-Moulton method.) For a given func- 
tion f = f(x, y), a given constant ἢ > 0, given integers k > 0 and 


numerical solution of differential equations 281 


I> 0, and given values yo, y1,...5 Vx, let xX, = Xo + mh, tn = 
I(Xms Vm) (m = 0,1,...,k), and generate the sequence y, by the 
following set of formulas: Form Vf,, V2f,,..., V*f,, and calculate 
torn = KF + 1... 


(14-31) ἐν iio 
pres = Fn = h{cotn τ ον, αὐ λον c.V ἘΠ], 
and 
και = Vana 
where for? = 0,17,...,/-—1 
fA 


Ἐ1 = f(Xn41 yrds 
yn Ὡς = Vert — Vif (= 1,2)...,6 +1) 


and 


(14-40) yaty = Yn + Aled fh + οὗν δα +e + cB σελ 1. 


The reader will observe that this algorithm involves three iterations: 
The “outer” iteration for calculating the sequence {y,}, an inner iteration 
for calculating the sequence of values {y‘,,} for each ἡ, and a further 
inner iteration for calculating the differences Vf, for each i. (Problem 
19 shows that this last iteration can be avoided, however.) Since the 
tentative values yi}, are “corrected” at each application of formula 
(14-40), this formula is called a corrector formula. The formula which 
produces a first approximation y), is called a predictor formula. The 
Adams-Moulton method thus belongs to the class of so-called predictor- 
corrector procedures. Like the Runge-Kutta method, it is one of the 
most reliable and most widely used procedures for the numerical integration 
of ordinary differential equations. 

Let us convince ourselves that the sequence {γι} ,} generated by repeated 
application of the corrector formula would converge for i-> oo. For 
the purpose of analysis, we express the differences in (14-40) in terms of 
ordinates and observe that only the foremost values f®, involve ee 
Setting 

Ce oe P= 001,235 6.5 


equation (14-40) is of the form 
(14-41) Wi+1 = F(w), 


where 
F(w) = A(eg + cf +-++ + οὗν κα, Ww) + C, 


282 elements of numerical analysis 


C being a constant independent of w. According to theorem 4.2 the 
sequence {w’;} defined by (14-41) converges—for any choice of the starting 
value wo—if there exists a constant Καὶ < 1 such that, for any two real 
numbers y and z, 


(14-42) IF@) -- F(y)| Ξ Κις -- yl. 
Problem 8 of chapter 13 shows that 
(14-43) CEE ewe Chay = Lp ats 


where ¢;.4., 15 defined by (13-5). Hence if f satisfies a Lipschitz condition 
with Lipschitz constant L, 


[Ε(2) — F(y)| = Alensa||f@inea 2) — fOn+2 ¥)| 
= Ale, + 1|L|z — γ]}. 


Thus (14-42) is shown to hold with 
K = ΜΙΝ 
and this is less than one whenever 


| 


14-44 h<—-: 
( ) [Ce 61} 
We thus have obtained: 


Theorem 14.7 The inner iteration necessary to obtain each new 
value y,4; In the Adams-Moulton method converges for any choice 
of the predicted value vy}, whenever the step ἢ satisfies (14-44). 


By theorem 4.2, the quantity 
Va+i = lim γ 1 
is a solution of the equation 
(14-45) Yara = Vn + ἠϊοι κα + CT Wasa τ + CoV" Snel. 


Theoretically, we could thus define the Adams-Moulton values y,,; as 
the solution of the non-linear difference equation (14-45). The error 
Zn = Vn — Y(X,) of these values can be investigated by a method entirely 
analogous to that used in the proof of theorem 14.6. Expressing differ- 
ences in terms of ordinates, we write (14-45) in the form 


Vin4 Ξ Yn MOT ainda esas FO Oe Se @ ere 


Setting 


(14-46) Bi. = |Bess.ol  ἰδκ ιν, Ἔ 1.7 + [Ober κα]; 


numerical solution of differential equations 283 


and assuming that the starting values satisfy (14-32), we can show that 


Ϊ 


14-47 ee το, τις, 
( ) ΙΖ, ΞΞ 1 ἜΞΩ hl era |L 


{Bet —a)BR, 11, 


; ef =, ~ ΒΝ... aot 
# πε ᾿ 
ἘΠ lef alt 3 Yunus = 


BEL 
for aS x, Ξ 6. Here L again denotes a Lipschitz constant for the 
function f. If the starting values are sufficiently accurate, then the error 
bound (14-47) is considerably better than the corresponding bound 
(14-38) for the Adams-Bashforth method using equally many points, 
because h*** is now replaced by A**2, and ον < lesa le BE pp eRe. 
Similar bounds can also be obtained for the more practical case where 
the inner iteration is only performed a finite number J of times, as indicated 
in algorithm 14.7. It can be shown that the above qualitative conclusions 
still hold when 1 = 2. 


Problems 


16. Determine the constants ῥα, and ῥὴξ μι.» for k = 0, 1, 2,3 and calculate 
the values of the constants B,, and ΒΥ... 

17. Repeat problem 15, using the Adams-Moulton procedure with k = 2. 

18. Give a proof of (14-47) along the lines of the proof of theorem 14.6. 

19. Show that, by virtue of (14-43), the values y®,, defined in algorithm 14.7 
can be calculated more simply from the relation 


Vast? = Vier + hee sins, YP) τόπον YD, @ = 1). 
14.8 Numerical Stability 


In order to bring the discussions of the two preceding sections to a 
more concrete level, we now shall determine explicitly the numerical 
solutions of the initial value problem 


[γ7(0}) = 1 
(14-48 " agg 
» = Ay 

(exact solution: y(x) = e4*) by several methods based on numerical 
integration. 

We begin with the Adams-Bashforth method (algorithm 14.6) fork = 1. 
Since ον = 1, c, = 4, (14-31) yields the recurrence relation 

Vn+1 ai Fn Vs Nn + ἀν) 
= te A(Sfr = Ὁ... ἢ 


or, since in the present case f, = f(Xn, Vn) = AVns 


(14-49) ρα = Ya T ARGV i Ἐν... 0}. 


284 elements of numerical analysis 
This is a linear difference equation of order two with constant coefficients. 
Its characteristic polynomial 


p(z) = ζῇ — (1 + 3Ah)z + 1.48 
has the zeros ain: ἫΝ 
Ζ1 =4+ 34h + V4 + ΖΑ + 2( Ah)? 
and, by Vieta 
Ah 


£9 = 5, 
These zeros are certainly distinct for sufficiently small values οἵ ἢ. By 
§6.3 the general solution is thus given by 
Yn = C121 + 0122. 


where ὦ and ὦ are two arbitrary constants. Determining these constants 
in such a manner that the solution satisfies γὺ = 1, we obtain 


(14-50) Vn = CL — Φ)ΣΊ + C222, 
where ὦ is a function of the as yet unspecified value γ1, 


se Cl 3 
29 Ἐπ 21 


(14-51) C= 


As in §14.3, we now shall study the behavior of y, as h + Ὁ andn-—> οὐ 
while mi = x remains fixed. We begin by expressing z] in such a way 
that this behavior becomes evident. Expanding the square root in powers 
of Ah (see Taylor [1959], §15.10) we find 


z,= 1+ Ah + JAA)? — LAA)? + O(A*). 
By virtue of 
εν = | + Ah + 4(Ah)? + ΤᾺ} 4+ O(M*) 


this may be written 
(14-52) z, = δὰ" — (Ah)? + O(h*) 
or, since e*" = 1 + O(h), 

Zz, = ΕΠ] — (Ah)? + O(A*)]. 
By virtue of nh = x it follows that 

ze = e47[] — §,-xAth? + 0(h°)]. 


Since Ζ5 = O(h), 23 = O(h") is very small compared with zi. We thus 
obtain from (14-50), neglecting less important terms, 


(14-53) Va ~ ear ἧς. 3; A®xe4*h? r eer. 


numerical solution of differential equations 285 


Here the first term is clearly the desired exact solution. The second term 
arises as a result of approximating the integral in (13-6) by a discrete 
formula. In the general case it would be 


Cy AP texpkt teat, 


The third term is a function of the choice of our second starting value γ1. 
It would be zero if we succeeded in choosing y, = z,. It is small but 
different from zero, if we take y, = ον the value of the exact solution 
atx,. The significant fact is that both errors, the one due to discretization 
as well as the starting error, contain the factor e4”. This is particularly 
important when A is negative, e.g., when A = —1. In this case, the 
errors do not grow indefinitely; the discretization error, for instance, 
grows up to x = 1 and then tends to zero, like the solution itself. 

Let us now examine the integration of the same problem by a different 
integration formula. As the error of the interpolating polynomial is 
smallest near the median interpolating point, one is always tempted to 
use symmetric central difference formulas for numerical integration. To 
consider the simplest case, let us integrate the identity 


y'(x) = FO, y@) 


between the limits x, _, and x, 4:, using the midpoint rule (see problem 4, 
chapter 13) to approximate the integral. There results the formula 


(14-54) κι = Vent Ἢ 2hfn- 


This formula could be used, in principle, much like the Adams-Bashforth 
formula. For the special initial value problem (14-48) we obtain the 
difference equation 


)ργαι = έν, + Yn-1. 
Its characteristic polynomial 

P(z) = z* — 2Ahz -— 1 
has the two distinct zeros 


Zz, = Ah + V1 + (Ah? 
and 


Ζῷ => —— 
a Zy 


All solutions satisfying yy = 1 can thus again be represented in the form 
(14-50). 


286 elements of numerical analysis 


For the purpose of exploring the asymptotic behavior of the solution y, 
as h > Ὁ while nh = x remains fixed, we again expand z, in powers of ἢ. 
Using the binomial formula (see Taylor [1959], p. 479) we obtain 


z,= 1+ Ah + (AA)? + (Ὁ 

or, by comparison with the exponential series, 
z, = ε΄" — (Ah)® + O(h*). 

Proceeding as above, we thus find 

zt = e4*[] — trA®h? + O(h°)] 
and 

zt = (—z,)~* = (—1)"e 4" [1 + 4xA®h? + O(A°)]. 

It is seen that z® and σῇ have the same order of magnitude, and that the 


term ΟΖ can now no longer be neglected. We thus find from (14-50), up 
to less significant terms, 


(14-55) Vn ~ eA% — LASxe4th? + cge4* + €o(—1)"e- “5. 


In the leading term we again recognize the desired solution of the 
continuous problem. The second term represents the genuine discretiza- 
tion error, due to the approximation of an integral by a discrete formula. 
This term is smaller than the corresponding term in (14-53), emphasizing 
the greater accuracy of central difference formulas. The last two terms 
are “starting” errors; they owe their presence to the fact that γὶ # Zz, in 
general. In particular both terms are present if we take y, = eee the 
exact value. These starting errors would not be of much concern if it 
were not for the fact that the second term (—1)"c,e~ 4*, shows a behavior 
whose character is exactly opposite to that of the exact solution. If A is 
negative, this term will grow at an exponential rate and, no matter how 
small initially, overshadow all other terms if x is sufficiently large. 


EXAMPLE 
8. Table 14.8 shows some values of the errors z,= y, — ¢~** of the 
numerical approximations to the solution of y’ = —y, γ(0) = 1] obtained 


by the formulas (14-49) and (14-54) with the step ἢ τῷ 0.1. 

While for small values of x the Adams-Bashforth method has the 
larger errors (due to the fact that the discretization error is larger) we have 
at x = 5 reached a point where the error of the midpoint formula is 
larger by a factor 100, due to the exponentially growing oscillatory 
component in (14-55). 

The presence of oscillatory error terms that grow relative to the exact 
solution represents a special kind of numerical instability that is typical 


numerical solution of differential equations 287 


Table 14.8 
: 
Xn 10°z, 1052, 
Adams Midpoint 
εὐ Tt ΠΡ ΠΑ͂ΜΕ ΒΞ Tie ὙΠΠΟΝΙΕ ΤΗΝ 
0.1 0 0 
0,2 38 30 
0.3 70 21 
0.4 90 5] 
5.0 13 1120 
5.1 14 — 1385 
5.2 12 1406 
5,3 11 — 1665 
5.4 11 1730 


for a number of formulas for integrating differential equations. This 
phenomenon of numerical instability can be traced to the fact that the 
characteristic polynomial of the linear difference equation arising from 
integrating y’ = Ay has several zeros of approximate modulus one. This. 
in turn, will always be the case if the identity y'(x) = f(x, y(x)) is inteprated 
over an interval whose length exceeds one integration step. 


Problems 


20. Carry out the above investigation for the integration algorithm based on 
Simpson’s formula 


Yat+1 = va-1 + A(t 1 T thn ΕΝ $hn-1): 
Show, in particular, that the starting error now has a component of the 
form (— 1)"coge~ 479, 


21. Integrating Bessel’s interpolating polynomial between the limits x_, and 
χα, Obtain the following formula for integration: 


Yn+2 = Ya-1 + Hare + 3fnsi + 3h + fo-2). 


22*, Investigate the stability of the formula obtained in the preceding problem. 
23. In constructing table 14.8, »y,; was chosen as the exact value e-". Show 
experimentally that, due to rounding errors, the phenomenon of numerical 
ee occurs also for y; = 2, = —h + V1 + A?, where theoretically 

Co = ὦ, 


Recommended Reading 


A large variety of methods for integrating differential equations is 
discussed in Milne [1953]. Collatz [1960] contains a wealth of valuable 


288 elements of numerical analysis 


information. The phenomenon of numerical instability of algorithms 
for the solution of initial value problems was first discussed by Rutis- 
hauser [1952] in a brief but classical paper. A comprehensive treatment of 
errors and numerical stability will be found in Henrici [1962, 1963]. For 
the numerical solution of boundary value problems and partial differential 
equations, which had to be ignored here for reasons of space, we refer to 
the excellent treatises by Fox [1957] and Forsythe and Wasow [1960]. 


Research Problem 


Study the effectiveness of repeated extrapolation to the limit in the 
numerical solution of differential equations. In particular, compare the 
accuracy obtained with repeated extrapolation to the limit in the Euler 
method with that given by the Runge-Kutta method and the Adams- 
Moulton method, using the same number of evaluations of /- 


| 
| 


PART THREE 


COMPUTATION 


_ 


ee — σα. 


2 Se ee See ee σσσσαν 


——— 5: 


_ i = —S 4 = 
πα εκ ae 


] 


chapter ] 5 number systems 


All numbers that have occurred so far in the theoretical discussions in 
this book were real (or complex) numbers in the strict mathematical sense. 
That is, they were to be conceived as infinite decimal fractions, or as 
Dedekind cuts. For the purposes of computation such numbers have to 
be approximated by real numbers of a rather special type, such as 
terminating decimal fractions, or other rational numbers, The present 
chapter is devoted to a study of the number systems that can be used for 
the purposes of computation. 


15.1 Representation of Integers 


Let us take a look at our conventional number system. What do we 
mean by a symbol such as 247? Evidently 


247 = 2.100 + 4-10 + 7 
= 2.105 + 4.101 + 7-10°. 


The number 247 is represented as a polynomial in the base 10, with 
integral coefficients between 0 and 9 = 10 — 1. 

There is no intrinsic reason why 10 should be used as a base; the 
number of fingers may have to do with it. There is evidence that in 
cultures different from ours other number systems have been used. The 
French word quatre-vingts for the number 80 indicates a system with 
base 20. (Maybe the French counted with their toes as well as with their 
fingers.) In New Zealand, words for 113 and 11° have been found. The 
Babylonian astronomers used a sexagesimal system, i.e., a system with the 
base 60. A trace of this can be found in our dividing the circumference of 
the circle into 360 degrees. Also mixed systems, although mathematically 
much less satisfying, are in use, such as the Anglo-Saxon system for 
measuring length, and the English monetary system. 


291 


β 


292 elements of numerical analysis 


In electronic computation, the digits of an integer are represented by 
various states of a physical quantity, such as an electric current. The 
technically simplest situation arises when there are only two states to be 
represented, such as the state “πὸ current” and the state “a unit current.” 
For this reason, modern electronic computers work internally almost 
exclusively with the base 2. The resulting number system is called the 
binary number system. In this system, only the digits 0 and 1 occur. In 
order to distinguish them from decimal digits, we shall underline them. 
Thus, if a given nonnegative integer N is in the binary system represented 
in the form 


(15-1) N = a,2" + α,...25 1 +--+ + a2 + a 

where the a, are either zero or one, it will be written in the form 
Blk cas = shine 

EXAMPLE 

1 1 =1,2 = 10,3 = 11, 101 = 5, ὃ = 1000, 1010 = 10. 


If we wish to communicate with a computer working in the binary 
system, we (or the computer) must be able to convert a number from the 
decimal to the binary system and conversely. To convert from binary 
to decimal, we regard the number Ν᾽ given by (15-1) as the value of the 
polynomial 

P(X) = GyX™ + ας, χ᾽ +++ + ax + do 


forx = 2. To evaluate P(2), we may use algorithm 3.4. (Note that the 
coefficients of the polynomial are numbered differently now.) It follows 
that if we calculate the numbers 6, recursively by 


( 5-2) by = d,, (k = 1, él ae n), 
then b, = P(2) = N. 


δι = ἄμ-κ + 2δκ.. 


EXAMPLE 
2. To express the number N = 11111001111 in decimal. 
scheme yields 


The Horner 


k a Oe OR ee ee ee ae a 4) 
qos) one dul ΠῚ qe "ἢ τ δυνά cadens 4 1 
bh, 1 3 7 15 31 62 124 249 499 999 1999 


It follows that N = 1999. 


To convert a given integer from decimal to binary, we make use of the 
fact that the /ast binary digit a, of an integer N is zero if and only if N 15 


= —_— 


number systems 293 


even. The second binary digit a, is zero if and only if (n — a)/2 is even, 
and soon. This leads to the following scheme: 


Algorithm 15.1 To find the binary representation (15-1) of a given 
positive integer N let 


No = N, 
15-3 : i 
( ) Nev, = ee, k= Ὁ, toes 
where 
] if NV, is odd 
15-4 NS 4 kk Η 
μ-Ὁ9 ω fs [ΓΝ is even. 


Continue until NV; = 0. 


EXAMPLE 


3. To express N = 1999 in binary form. Algorithm 15.1 yields the 
scheme 


k 0 a Ι 2 3 ces 6 F αὶ Ὁ 46 
Ni 900 OOS 409. 249 124 Ο' 31] 15 7 3 1 
Ar, I Ι Ϊ Ϊ 0 Le ee eee ees a «ὦ 


It follows that 1999 = 11111001111. (Note that the J/east significant 


digits are obtained first.) The scheme is an exact reversal of the scheme of 
example 2. 


Problems 


1. Express the numbers 1685, 1770, 1882 in the binary system. 

2. What are the decimal values of the binary numbers 1’000, 1’000’000, 
1’000’000'000? Sal! Cl | 

3. What is, for N—> οὔ, the ratio of the number of digits in the binary and 
the decimal representation of an integer N? 

4. State the rules for the conversion of a decimal integer N into the ternary 
system, and vice versa. Express 10" in the ternary system (n = 1, 2, 3). 

5. What is the representation of the positive integer N in the number system 
to the base N? 


15.2 Binary Fractions 


A binary fraction is a series of the form 


τσὶ 
ao Ds ὙΠ a 


k=1 


(15-5) 


294 elements of numerical analysis 


where the coefficients @_,,a@_.,... are either zero or one. The series 
(15-5) always converges, because it is majorized by the geometric series 


= Lid 

all ὩΣ abe 
The sum z of (15-5) will also be denoted by 

z= 0.d_14_90_3.... 


The binary fraction (15-5) is said to terminate if, for some integer ἢ, 
ὥς = Ὁ. > a. 
The following theorem is fundamental, but will not be proved: 


Theorem 15.2a Any real number z such that 0 < z <1 can be 
represented in a unique manner by a nonterminating binary fraction. 


If we drop the condition that the binary fraction shall not terminate, then 
the representation may not be unique; for instance, the binary fractions 
0.1 and 0111... 
both represent the number 0.5. 


A terminating binary fraction z = 0.a_,a_... 
the value of the polynomial 


P(x) = α- τα + A_ox® +--- 


at x = ὁ and thus can be evaluated by algorithm 3.4 (Horner’s scheme) as 
follows: Let 


bo = δ΄. ἢ» 


.a_, can be regarded as 


Hf he 


δι = Gana + $0n-1, se 452, at 
= (0. Then 5, = z. 

EXAMPLE 

4. To express z = 0.00110011 in decimal. 


where dp 


Horner’s scheme yields 


ἣν 
= 
Ι 

o 
= 


Ocoee. 6 5S eae 7 
or 


1.5 

0.75 

0.375 
1.1875 
1.59375 
0.796875 
0.3984375 
0.19921875 


Ὁ  ὠσ tah μι  !Ὶ 


It follows that z = 0.19921875. 


number systems 295 


Another method for converting a terminating binary fraction consists in 
converting the integer 


252 = fie = a di. * + tit Ἔ ἔξ. ἡ 


and dividing the result by 25, 

Except in special circumstances, non-terminating binary fractions 
cannot be converted into terminating decimal fractions. To get an 
approximate decimal representation, we truncate an infinite binary 
fraction after the mth digit and convert the resulting terminating fraction. 
The error in this approximation will be less than 27". 

The inverse problem of converting a given (decimal) fraction into a 
binary fraction is solved by the following algorithm: 


Algorithm 15.2 For a real number z such that 0 Ξ z < 1, calculate 
the sequences {z,} and {a_,,} recursively by the relations 


ot = 25 


ἴϑ {0 


Zea. = 2Ζι — Any 


1 22, > 1, 
if 22}. = l, 


eS 4s 
Theorem 15.2b For the sequence {a_,,} defined by (15-6), 


(15-6) 


z= 0. ὦ..1τῷ.. 7.5. ἂν 
Proof. According to theorem 15.28, z has ἃ nonterminating binary 
representation of the form 


z= 0.6_,b_2b_s. way 
we have to show that 


(15-7) bu. = Aun πε rt a 
where a_ κ is defined by (15-6). Incidentally we shall also show that 
(15-8) ele - DBD δ. ρους, i i. ae 


The simultaneous proof of the two assertions is by induction. Clearly, 
(15-8) is true fork = 1. Let us assume that (15-7) and (15-8) are true for 
the integers k — 1 andk, wherek = 1. We then have 


22; = bon -- LP raat: en one 


Since the binary fraction on the right is positive and bounded by one, it. 
follows that 2z, > 1, and hence a, = 1, if and only if 5, = 1. This 
establishes (15-7), and, by the second formula (15-6), ss -8) with k 
increased by one. 


296 elements of numerical analysis 


EXAMPLE 
5. To express z = 1 asa binary fraction. We have 


k Zi 22. a» 
] 0.2 0.4 0 

2 0.4 0.8 0 

3 0.8 1.6 ] 

4 0.6 1.2 l 

5 0.2 


Since z, = 0.2 has occurred before, the periodic} binary fraction 


0.2 = 0.001100110011... = 0.0011 


results. (The period is indicated by a cross-bar.) 

It is now easy to represent any nonnegative number in the binary system. 
If z is such a number, let [z] be the greatest integer not exceeding Ζ. 
According to §15.1 we can write 


[z= D> a,2*, 
k=0 


where the ας are either zero or one. For the fractional part we have, by 
(15-5) ; 


Ζ -- [2] Ξξ 2 ia ἢ 
k=1 


with the same limitation on the a,. Thus 


ΠΣ] Ὁ ὦ - ἢ -- > α.ω2-Ὁ 


ἘΞ -- ἢ 


μὴ 
| 


— Any 1+ + Gq-d_14_9. se 


To convert a binary number into a decimal or vice versa, we must convert 
the integral and the fractional part by themselves. 


Problems 


6. Give the binary value of 7, correct to 12 binary digits after the binary 
point, 

7. Express the binary numbers (a) 0.10101, (b) 0.10101 as ratios of two 
integers. 

8. Determine the representation of + in the number systems to the base 
(a) 2, (b) 5, (c) 7. 


7 It is shown in number theory that every rational fraction can be represented by an 
infinite binary fraction that is ultimately periodic. 


number systems 297 


9. Devise an algorithm that yields directly the binary representation of the 
square root of a given positive binary number. Use your algorithm to 
determine the binary representations of v2, Va, V5, and check your 
result by converting the known decimal representations of these numbers. 


15.3. Fixed Point Arithmetic 


As mentioned earlier, in most digital computing machines today numbers 
are internally represented in the binary system. Each element of a 
computer can assume only a finite number of (recognizable and repro- 
ducible) states, and each computer has only finitely many elements. It is 
thus clear that only finitely many different numbers can be represented in a 
computer. A number that can be represented exactly in a given computer 
is called a machine number. ΑἸΙ other numbers can be represented in the 
computer only, at best, approximately. There are two principal ways in 
which a given number z can be represented. They are known as the 
fixed point and as the floating point representation of z. 

In fixed point operation, the machine numbers are terminating binary 
fractions of the form 


ἑ 
(15-9) tS a..2-* 
k=1 


where ἢ is an integer that is either fixed for a given machine (t = 35 for the 
IBM 7090, ¢ = 48 for the CDC 1604) or, in some cases, can be selected by 
the user (e.g., on the IBM 1620). The numbers used by the machine can 
also be described as the set of numbers n-2~', where n is an integer, 
|n| < 2. The density of machine numbers is uniform in the interval 
(—1, 1) and zero outside this interval. 

If z is any real number, we shall denote by z* one of the (at most two) 
terminating binary fractions of the form (15-9) for which |z — z*| is a 
minimum. If the interval 


R = [-1 + 2-*"4,1 -—2°-'-3] 
is called the range of the machine, then 
(15-10) [2 -- Ἐ] Ξ 2.1 forall ce R, 


that is, any real number within the range of the machine can be approxi- 
mately represented by a machine number z* with an error of at most 
2~*~*. Any of the (at most two) representations z* of a number z will 
be called a correctly rounded fixed point representation of z. 

If a and 4 are two numbers within the range of the machine, the numbers. 
a + 6 do not necessarily belong to the range. If they do not belong to 
the range we say that overflow occurs. The ever-present possibility of 
overflow is one of the serious disadvantages of fixed point arithmetic. 


298 elements of numerical analysis 


The above remark shows very clearly that even if the data of a compu- 
tational problem are within range, we can usually not be sure that all 
intermediate results fall within the range. It is frequently possible, 
however, to obtain a (possibly very crude) a priori estimate of the size of 
the intermediate results. One then may try to reformulate the problem 
(for instance by introducing new units of measurement) in such a manner 
that the data as well as the intermediate and final results are within range. 
This reformulation is known as scaling. The frequent necessity of scaling 
is a further disadvantage of fixed point arithmetic. 

Let us now consider the accuracy of the arithmetical operations in fixed 
point arithmetic. If a and 6 are two machine numbers, and if a + & is 
within range, then it is clear that ἃ + bis alsoa machine number. That is, 
addition can be performed without error in a fixed point machine, i.e., 
we have 


(15-11) (a+ b)*=at+b, ifat+beR. 


The same is not true of multiplication, however. If a and δ are two 
machine numbers, then ab is, in general, not a machine number. How- 
ever ab ε R, hence there exists a correctly rounded machine value of ab, 
and we have 


(15-12) \(ab)* — αὐ! < 2-1-.. 


The ratio a/b of two machine numbers is, in general, not in the range of the 
machine, (The probability for this to be the case is only 50 per cent—a 
further disadvantage of fixed point arithmetic.) If it is, we again have 


\(a/b)* — a/b| 5 2.1, 


independently of the size of |a/b|. 

Many machines have built-in subroutines for calculating values of 
elementary functions such as sin x or Vx. It is clear that even if the 
argument x is a machine number, the value of the function is, in general, 
not. In the case of the sine function, the best we can hope for in general 
is a correctly rounded value of sin x, but even this ideal is seldom attained. 
The author knows of a machine where ν Ὁ was 27¢. 


Problems 


10. Devise an algorithm for forming the sum of two positive machine numbers 
of the form (15-9). 

11. Reformulate the algorithm obtained in problem 9 in such a manner that it 
yields, for every nonnegative machine number x, a correctly rounded 


value of Vx, 


number systems 299 


12, Devise an algorithm that yields the correctly rounded product of two 
machine numbers a and ῥ. 


15.4 Floating Point Arithmetic 


In floating point operation, the set of machine numbers consists of 0 and 
of the set of all numbers of the form 


(15-13) z= +m2? 


where #7 is a terminating binary fraction, 
é 7 
m= » Ὁ, 4im< il 
k=1 


normalized by the condition a_, = 1, and where p is an integer ranging 
between —P and P, say. The integers ¢ and P are normally fixed for a 
given machine. (On the IBM 7090 computer, ¢ = 27, P = 128.) The 
binary fraction m is called the mantissa and the integer p the exponent of 
the floating number (15-13). The machine numbers now cover a much 
wider range than in fixed point arithmetic. We again denote, for any real 
z and for a given machine, by z® any of the at most two numbers of the 
form (15-13) for which |z — z®| is minimized. We now define the 
range of the machine to be the interval 

R= [20 = 2°), Fd = πότ 
If ze R, [2] 2 2-"-+, we can define z® by 
(15-14) Ζ9 = sign z-m-2? 
where 

p = [loge ,2] + 1, m= ΣΎΝ 


Here log, denotes the logarithm to the base 2, [x] denotes the greatest 
integer not exceeding x, and the * refers to correct rounding in fixed point 
arithmetic. Any of the (at most two) values z® will be called a correctly 
rounded floating representation of z. If |z| < 2- ὅτ 1, we set z® = 0. 

If |z| 2 2~*-1, we evidently have 


ΙΖ ἘΝ z®| < apy-t-1 


Jp-19-t 
= 2[1σΕ.|5}1} --ἰ 
=< “ἴοε. [512 -- ὖ 
and hence 
(15-15) z — z®| < 2-ι:]. 


This relation shows: 


300 elements of numerical analysis 


Theorem 15.4 If, for a given floating point machine, ze R, |z| 2 
2. τι then the relative error of the correctly rounded floating 
representation z® of z is at most 2 ἡ, 


For z — 0 the relative error tends to οὐ, as in fixed point arithmetic. 

Since machines with floating arithmetic can handle very large numbers 
with a small relative error, scaling is, in general, not necessary, and the 
possibility of overflow is minimal. For these reasons, floating arithmetic 
is almost universally used today. The FORTRAN system, for instance, 
employs floating arithmetic. 

Let us examine briefly the accuracy of the basic arithmetic operations in 
floating point arithmetic. If a and 6 are two floating machine numbers, 
and if a + δ is in the range of the machine, then, unlike the fixed point 
case, a + 6 is not necessarily a machine number. Thus we must also 
expect rounding errors in addition. If 

τ ἘΜ, δ: AZ, 
and if p 2 g, we have 
a+b= +m2? + η2" 
= (tm + n27~-")2?, 


Here |+m + n2*~*”| < 2; thus we have, at worst, 


(a+b)? = (ays 


and hence 
= 12» +1 


(ὦ + b)® — (a + b)| S27 

ΞΕ 5 ἘῈ 1 log, [ἃ] 
and consequently 
(15-16) l(a + δ} — (a + b)| S 27 * Jal, 


Thus the error of the machine value of a sum (or difference) is at most 
2. "11 times the absolute value of the larger summand. 

The product ab of two machine numbers a and ῥ᾽ is now (contrary to 
fixed-point arithmetic) not always a number in the range of the machine. 
If the absolute value of the product lies in the interval [2~", 2°], we find 
by considerations similar to those just given that 


(15-17) \(ab)® — abl < 2. ΠΡ]. 


Similarly we find for division, if a/b is in the range of the machine, 
᾿Ἶ 


(5) -5 
" δ b 


Concerning built-in subroutines for elementary functions, the remarks 
made above apply also to machines with floating point arithmetic. 


a 


(15-18) 


Eee 


— 


number systems 301 


Multiple precision arithmetic. As was noted above, both fixed point 
numbers and the mantissae of floating numbers are represented in the 
machine in the form N x u, where the number uw = 2~' may be called the 
basic unit of the machine, and where N is an integer, |N| < 2¢. For 
some machines and/or some problems this approximation is not sufficiently 
accurate. In such cases one can increase the accuracy by working instead 
with numbers of the form 


(15-19) Niu + Nou? +++» + Nou’, 

where the N,, are integers satisfying 0 < N, < Ὡς ee £2 δὲ 
This enlargement of the set of available machine numbers is called working 
with q-fold precision. Double precision (q = 2) is fairly common: on 
some machines, its use is encouraged by built-in circuitry to facilitate 
arithmetical operations with multiple precision numbers of the form 
(15-19). Multiple precision with g > 2 is normally restricted to special 
experimental programs. From a mathematical point of view, multiple 
precision is equivalent to single precision with 1 replaced by gt. 


Problems 


13. Prove the inequalities (15-17) and (15-18). 
14. Assume that in the approximate formula of numerical differentiation 
Di flo) = ASE, 
f;, and f_, are known only with errors =2-1°, whereas ἢ is an exact binary 
number. 
(a) If floating operations are used, how big is (at most) the resulting error 
in Dy f(Xo)? 
(b) If the function to be differentiated is f(x) = Jo(x), for which value of ἡ 
is the maximum error in the numerical value of J((x) due to rounding 
equal to the maximum error due to numerical differentiation 7 
(c) For which value is the sum of the two errors a minimum? 


Suggested Reading 


Wilkinson [1960] gives a careful account of rounding errors in the 
elementary arithmetic operations. For a wealth of detail from the point 
of view of computer technology see Speiser [1961]. 


Research Problem 


Assume that in a fixed point machine it is possible to identify both the 


more and the less significant half of the exact value of the product of two . 


machine numbers. Formulate an algorithm for obtaining the correctly 
rounded value of the product of any two complex numbers whose real 


and imaginary parts are machine numbers. 


chapter | 6 propagation. of rounding error 


16.1 Introduction and Definitions 


The process of replacing a real number z by a machine number z* or z® 
is called rounding. The difference z* — z or z° — z is called rounding 
error. Due to rounding, the final result of a numerical computation 
usually differs from the theoretically correct result. The difference of the 
numerical result and the theoretically correct result is called the accumu- 
lated rounding error. In order to distinguish them from the accumulated 
rounding error, the rounding errors committed at each step of a compu- 
tation will also be called the /ocal rounding errors. Except in simple 
cases, the accumulated rounding error is not merely the sum of the local 
errors. Each local rounding error is propagated through the remaining 
part of the computation, during which process its influence on the 
accumulated rounding error may either be amplified or diminished. 

Different numerical procedures for solving the same theoretical problem 
(such as integrating a differential equation) may show a different behavior 
with respect to the propagation of local rounding error. This varying 
sensitivity with regard to rounding off operations is frequently referred to 
as the numerical stability of the process. In chapter 8 we noted that the 
numerical stability even of one and the same numerical procedure (the 
QOD algorithm) can depend strongly on the way the arithmetical operations 
are performed. 

In the present chapter we shall study the propagation of local rounding 
error in some simple but typical cases. 


16.2 Finite Differences 


As a first example we consider a case where the accumulated rounding 
error is due exclusively to errors in the data of the computation; all 
intermediate arithmetic operations are performed exactly. Let x, = 
302 


propagation of rounding error 303 


a+nh, n=0,1,2,..., let f be a given function, and let Fe POS, 
We consider the effect of replacing the exact values /, by machine values 
J; on the differences of the sequence of values ἢ). 

In fixed point arithmetic, we clearly have 


(16-1) fF =7, ες Ἢ τὸ | 10s ae ee 
where the local rounding errors ες satisfy 
(16-2) le,| S e = Ly, Ne 8 Ba Sie eee 


u being the basic unit of the machine (or the table of ἢ). According to 
(6-27) 


ah = > (—-"() fav 


ange = Σ -"() 0 


m=0 


Hence, in view of (16-1), if we denote by r® the accumulated rounding 
error in the Ath difference, 


HO = ANf* — ΔΈ, = AM f* — f,) 
k 
ΠΝ" 
= Σ (ιν (7, em 


It follows by (16-2) that 


Wl s > ([ μιν 


In view of 
S (",) = (1+ 1) = 2%, 
m=0 m 
we obtain for r“ the bound 
(16-3) J psd ie gt 
This bound cannot be improved in general; equality holds whenever 
Spe ΞΞ (-- ie: 
EXAMPLE 


I. We consider an excerpt of a table of sines, tabulated with a step | 
h = 0.01 to six decimal places. In view of 


BV, = WIE), ΞΕ Mak 


304 elements of numerical analysis 
(see problem 11, chapter 11), the exact differences satisfy 
aS ter", ee | atl Sie eas 


and are zero from 4* on to the number of digits given. The following 
table shows the numerical differences ΖΦ ἐγ: 


Table 16.1 
Xn vt = (Sinxn)* yt δ δΔνλ Att Aye 


0.25 0.247 404 


0.26 0.257 O81 BOE) ot\idluay 

0.27 0,266 731 a δ —— 

0.28 0.276 356 rn re: 5 1 
0.29 0,285 952 7290, _98 l a: 
0.30 0,295 520 i a, a4 l 
0.31 0,305 059 i Te Ι Ξ 
0.32 0.314 567 ia, 1 Ν 
0.33 0.324 043 a _ $4 : πε -2 
0.34 0.333 487 ee 33 


0.35 0.342 898 


The example shows that the local rounding errors e,, although hardly 
noticeable in the function values /,, themselves, show up very strongly in 
the higher differences. The propagation of rounding error is real. 


Problems 


1. A tablemaker wishes to add, in a table of f(x) = Jo(x) giving eight decimal 
digits after the decimal point, values of δῆς, and 6*/f, that differ from the 
exact values by at most 0.6-10~°. To what accuracy does Jo(x) have to 
be calculated? Does the result depend on the step ἢ! or on the function [7 

2. The impossibility of making a correctly rounded table. A recurrent 
nightmare of tablemakers is the following situation. Assume a table of a 
function {15 to be prepared, giving six digits after the decimal point. For 
a certain value of x,, a very accurate computation yields 


with an uncertainty of 7.10. 15. Should this value be rounded up or 
down? A classical result in number theory states that if 15 an irrational 
number (such as V2 or e), the numbers 


an — [ce], Bee Bt. 3... 


come arbitrarily close to every number in the interval (0, 1). Show that 
this result implies the following: Given any two positive integers N and M, 
and any finite procedure for calculating square roots, there always exists 


propagation of rounding error 305 


an interval such that it is impossible to construct a correctly rounded 
table of the function f(x) = V2.x with the step 10-™ in that interval, 
giving N digits after the decimal point. 


16.3 Statistical Approach 


Example | above shows that the propagated rounding error in a table 
of differences can be considerable. It also shows that the errors in the 
differences probably rarely have the maximum values given by (16-3) 
(8 and 16 for the last two difference columns). This is due to the fact that 
consecutive local rounding errors ἐς only rarely have the maximum value 
permitted by (16-2) and, in addition, occur with strictly alternating signs 
such as ε, —e,e, —e,.... Such an occurrence would contradict the 
intuitive notion that the local rounding errors are, somehow, distributed 
in a random fashion. 

It is plain that, on a given machine and for a given problem, the local 
rounding errors are not, in fact, random variables. If the same problem 
is run on the same machine a number of times, there will result always the 
same local rounding errors, and therefore also the same accumulated 
error. We may, however, adopt a stochastic model of the propagation of 
rounding error, where the local errors are treated as if they were random 
variables. This stochastic model has been applied in the literature to a 
number of different numerical problems and has produced results that are 
in complete agreement with experimentally observed results in several 
important cases. 

The most natural assumption concerning the local rounding error in the 
stochastic model seems to be the following: We assume that the local 
rounding errors are uniformly distributed between their extreme values 
—e and e. The probability density p(x)—defined as (4x)~* times the 
probability that the error lies between x and x + 4x—of this random 
distribution is given by 


0, x < --ε, 
(16-4) PA)L=4G. τ 8.3 ΖΞ ε; 
0, a = & 


The constant ¢ is determined by the condition that the sum of all 
probabilities, that is, the integral 


[pe ax 


equals 1. This yields the value 


in (16-4) (see Fig. 16.3a). 


306 elements of numerical analysis 


7 = p(x) 


σΞ] 
(i) Normal probability distribution 


Figure 16.3 


For theoretical purposes it is more convenient to assume that the local 
rounding errors are normally distributed. The normal distribution is 
defined by the probability density ) 


Ι 

(16-5) P(x) WRG 
Here o is a parameter, called the standard deviation of the distribution, that 
measures the spread of the distribution. For small values of o the 
distribution is narrowly concentrated around x = 0, with a sharp peak at 
x = Q. For large values of σ the distribution is flattened out (see Fig. 
16.3b). : 

The absolute value of a normally distributed random variable exceeds o 
in only 31.7 per cent of all cases and 2.576c in only 1 per cent of all cases. 

Both distributions considered above have mean value 0, For a random 
variable € with arbitrary probability density p(x), the mean or expected 
value is defined by 


(16-6) Ké = | π yr τς. 


—(r2/og2 
δ (απ 2 στ 


propagation of rounding error 307 


The standard deviation is defined as the square root of the variance of é, 
which in turn is defined as the mean of the square of the deviation of é 
from its mean. Thus, if μ = E(é), 


(16-7) svar (€) = ἢ (x — p)?p(x) dx; 5.4. (ὃ = Vvar (8). 


EXAMPLES 
2. The variance of the distribution defined by (16-4) is 
Ε l εΞ 
a Pameaibiteys 
| εἰς + Be 3 


3. The variance of the normal distribution is 


l Ἢ" a a 25" oo 
paris | x2e7 ἘΠῚ2σ5 dy — = | piie-t df = οἷ, 
V Ino J - x Var Jo 


Thus, if we wish the distributions (16-4) and (16-5) to have the same 
standard deviation, we must choose 


Ε 
ΟΞ — = 
V3 
If 1, &o,-.-, ἔμ are random variables with means μι, po,..., fm, and 
if dy, dg,..., Gm are arbitrary constants, the quantity 
(16-8) ὃ = ay,  αγξὰ Ἔ:.-Ὁ An€m 


is again a random variable, and its mean is given by 
(16-9) E(€) = aypy + Gaps FF Omitm 


The importance of the normal probability distribution (16-5) is based on 
the following fact: If the variables €,, €2,..., €, are independent (i.c., if 
the value of any ἔ, has no bearing on the value of any other &,), and if 
they are normally distributed with standard deviations σι, o9,..., Gms 
then the variable defined by (16-8) is also normally distributed, and its 
standard deviation is given by 


(16-10) 5,4. (ὃ = ν αξοξ  αξσξ +--+ + azo. 


In a limiting sense, the above statement is also true if the random variables 
€, themselves are not normally distributed. 

Let us now apply the above results to the problem of forming differences 
considered in §16.2. Assuming that the local rounding errors «, are 
independent, normally distributed random variables with mean zero and 


——<—<——_— υαπανν a ——— 


308 elements of numerical analysis 
standard deviation co, we find that rj is a normally distributed variable 


with variance 
i) 2 = (k\? 
var (r,’) = 2 [Ἢ ᾿ 


m=O 


Comparing coefficients of x* in the identity 


b+ (et Gell) + Gabe Ge 


=1+ (ἢ Ὲ [᾿ τὶ (24). 
we find that μ Ἰ- ω- ta, 
5 
We thus obtain in ᾿ (0 
var (rf) = το 
s.d. (ri) - ne 


In order to obtain an intuitive notion of the size of the coefficient 
appearing here, we approximate the factorials by Stirling’s formula 
(see §9.3): 


2k\ 28 
(2k)! ὁ aie). Le 
(ΚῚΣ ~ Ink (" τ πᾷ 


e 
This yields 


αὐ 
16-11 4,0, ‘Pe Mie  ΕΞΞΞΕΞΞ set ere 
mens te) va τ “3 Ψ πὶ 
in the case of the rectangular distribution (16-4). This relation shows 
that the ratio of the standard deviation and the theoretical maximum 
2"e is not as small as might be expected. 
EXAMPLE 
4. Twenty-four consecutive values of the exponential function given in 
Comrie’s table (Comrie [1961]) gave the following experimental values of 
the standard deviations of the differences A*y, (in units of the least 
significant digit): 


Table 16.3 
Whe 3 4 5 6 
2.28 4.22 7.64 


experimental s.d. “153 


theoretical s.d. (16-11) 1.29 2.42 4.57 8.78 


propagation of rounding error 309 
Problem 


3. For ἡ given machine numbers aj, ae,..., a, the sums 


Se = αἴ, Ἔα +++++ af, 
Sy = δὶ + So ese κκ 


ἐπ = 152.0009) 
are formed by means of the recurrence relations 
So = O, St = Se-a + aR; δὰ = ῦ, Se = Seay + Se 


(k = 1,2,...,”). What is the resulting standard deviation in s, and ΗΒ 
if the rounding errors at each step are considered independent random 
variables with the distribution (16-4)? What are the theoretically possible 
maximum errors? 


16.4 A Scheme for the Study of Propagated Error 


In the example considered above, the accumulated error was due 
exclusively to the rounding of the initial data of the computation. All 
intermediate computations were performed exactly. We shall now discuss 
a scheme of fairly wide applicability which permits us to take into account 
rounding in the intermediate results. 

Many numerical algorithms consist in generating a sequence of numbers 
Jos 91» 55... that are defined by recurrence relations of the form 


(16-12) dn = O90: Gis as - 


In actual numerical computation not the numbers g, are generated, but 
certain machine numbers g, (not necessarily the correctly rounded values 


ἘΠ τυ ose) Waa 


gn!), which satisfy the recurrence relations 


(16-13) Gn — O(Go, Gis sees eres 


Here the symbols ὦ, denote rounded and otherwise approximated values 
of the functions Q,. Instead of working with relation (16-13), which is 
difficult mathematically, we write (16-13) in the form 


(16-14) In ea Ο,(ζο» ἦι: π᾿ Gn-w ὩΣ ἔπ: 


We consider this relation as the definition of the local rounding error e,. 
Thus, 


we Te ee 


O,(Gos ἦι, πα πν ὦ, -..}). 


By analyzing the computational process used to evaluate O,, some state- 
ment can usually be made about size and distribution of the ες. 
Let now the accumulated rounding errors r, be defined by 


(16-15) m=a—- Gs n= 0,1, 2,2... 


Ἐπ τ᾿ O.(Gos Gis "ιν εξ) δ 


310 elements of numerical analysis 
By subtracting (16-12) from (16-14) we find 
Vn = On(Gos Gus ++ +s Fn-1) — QnlGos Gis -- +s Jn-1) + &n 


or, using the mean value theorem, 
(16-16) ty = Oro + οὗν +e + OE My a + ep 


where Οἱ denotes a partial derivative, 


Hines COn i | τε 
οι 2a ee OL saw — kh 
taken at some point between (qo, 91, .--,@n-1) and (Go, Gi, .. «3 Gn—1): 

If the functions Q,, are linear in the q,., the partial derivatives O%" do not 
depend on the previous rounding errors ro, ";,..-., ,—-1, and (16-16) then 
represents a /inear difference equation for the quantities r,. If the O® 
are constants, and if Of = 0 fork - ἢ — m, then (16-16) is a difference 
equation of order m with constant coefficients and can be solved by the 
method by §6.7 and §6.8. There results a solution of the form 


(16-17) rr, = > Fam ms 
m=O 


where the coefficients d,,,, themselves satisfy a certain difference equation. 

A solution of the form (16-17) also must exist in the general case of a 
recurrence relation of the form (16-12) with linear functions Q,,, since all 
e,, enter into (16-16) only linearly. In order to determine the coefficients 
dam We Substitute (16-17) into (16-16). There results 


i, = OO 'dooeo 
+ OF (dioéo =f dyy£1) 
7 O28 '(daoto + do.&1 -+- doo€o) 
-- μὰ 
+ Οὐ (dn-1,080 + dy-1.181 Ἔ 5 Ὁ Ἃἦ,.-.γ,.-- χδη--) 
Ἔξ. 


The expression on the right must be identical with (16-17) for all possible 
values Of 29, &;,...,&,. It follows that the coefficients of corresponding 
é, Must be equal. Comparing the coefficients of «, we immediately find 
(16-18) a, = I, H = 0, a ΘΕΌΝ 

and hence, comparing the coefficients of ¢,_4, &,~9,.. +5 €0; 


τας, {π,-1 
be ek +. OF ; 


(16-19) 42 .»-α = Οὐ Pdy—an-2 + Oe 


εἶ,,.0 = ἘΝ ee ὁ + OF ae Ἵ OF a0 τῇ ie 


propagation of rounding error 311 


These relations have the form of difference equations for the d,,,, with 
respect to the first subscript and can be solved in special cases. 
We summarize the above result as follows: 


Theorem 16.4 If (16-12) is a linear algorithm, i.e., if 
(16-20) “4, = Οδηρ + OlPqr +--+ + OP-Mgn_1 + Pn 


for certain constants QO? and p, (ἢ = 0,1,2,...; k =0,1,..., 
n — 1), then the dependence of the accumulated rounding errors on 
the local rounding errors is given by (16-17), where the coefficients 
dam are determined by (16-19). 


The coefficients d,,, may be called influence coefficients, because they 
indicate the influence of the local errors on the accumulated error. 

Once the influence coefficients have been calculated, the relation 
(16-17) may be used to make either a deterministic or a probabilistic 
statement about the accumulated rounding error. Assuming 


(16-21) len] Ξ 8, pie G1. F..... 

we find 

(16-22) ol Se dads | ie = OL Ba ce 
m= 


The bound on the right is attained for εν, = e sign d,», and thus cannot be 
improved. Assuming that the local errors are independent random 
variables with 


(16-23) Elen)=p, var(eén)=o7, m=0,1,2,... 
we find for the random variables r,, using (16-9) and (16-10) 


E(r,) = ἐν ἐν rms 
(16-24) pis 
vat 7.) =o" > a w= 0; 1,2... 
m=0 
In the case where the expected values and variances of the local errors 


depend on m (which is the appropriate assumption for floating point 
arithmetic) the assumptions (16-21) are to be replaced by 


(1625) Fajen, “wrteDee, kh = 0.12... 
yielding 
E(r,) = > ie: Caras 
m=0 


i 
varied > ode. 
m=O 


312 elements of numerical analysis 


In a qualitative sense, the above results remain even true when the 
functions QO, are not linear. The derivatives O% in (16-16) are then 
evaluated at (go, 91,---;9n,-1), and terms of higher order in Taylor’s 
expansion are neglected. This can frequently be justified if an a priori 
bound for the accumulated errors r, is known. 


16.5 Applications 


We now shall apply the results of §16.4 to a number of special algorithms. 
Most of these have already been discussed in preceding chapters. For the 
sake of simplicity we shall assume that fixed point arithmetic is used. The 
hypotheses (16-21) and (16-23) are then appropriate. In many cases the 
results remain qualitatively true for floating point arithmetic. 

(i) Evaluation of a sum. We begin by considering the extremely simple 
example of evaluating numerically the sum 


N 
p= > Ans 
n=l 


where 4), @2,...,@y are given real numbers. This can be put into 
algorithmic form by setting 


qo = 9, 
(16-27) . iy ot thi typowe| Παῖς Ἐν Dis ταν 


Clearly, gy = S. We thus have 
O,(Gos Gis oe Sy UG ΞΞ Ce 7 Τ' ay. 
In actual computation, in place of the exact numbers g, numerical 


values g, are generated according to the scheme 


Jo ie 0, 
aici [ = Gn-1 + απ. 


(In floating arithmetic, the second of the relations (16-28) would have to 
be replaced by ᾧ, = (ῳ,... + a®)®.) According to our scheme, we 
replace the second relation (16-28) by 


Gn ἘΞ ἔξ... ἘΝ Gy — Ens 
showing that the local error ες is given by 
ἔν = ay — da, 


and thus, in fixed point binary arithmetic, satisfies (16-21) with e = 27‘~*, 
or (16-23) with 


ioe τσ ΟΞ 


(16-29) 


μ = 0, 


propagation of rounding error 313 


In view of 
oO” =. Κ = 0,1, 2,...,2 — 2, 
Or» = 1, 
the relations (16-19) reduce to 
dnn-1 = 1, Ant: = An —1, 1s n>k 


and thus yield d, ,, = 1,n 2 k. Relation (16-17) thus shows that 
(16-30) 
(In the present simple example this result could have been obtained 
directly, see problem 3.) We thus find from (16-22) and (16-24) 

(16-31) \r,| Ξ πε, 5.4. (r,) = Vine. 


The result shows that while the theoretically greatest possible error in a 
sum of ἡ terms grows like ἡ, the standard deviation grows only like Vn. 
(ii) Iteration. We next consider the algorithm of solving the equation 


x = f(x) 
by determining the limit of the sequence {x,} defined by algorithm 4.1: 
y= F(Xn- 1): 
Here the functions Q,, in (16-12) are given by 
"3 Jn-1) rm I(Qn-1): 
To simplify matters we shall consider only the idealized situation where 
f(x) =ax+b 


and where ἃ and 4 are constant machine numbers. For convergence 
we must assume (see theorem 4.2) that [ἃ] < 1. On the machine, the 
theoretical recurrence relation 


On(Gos 4.».. 


qn sae adn —-1 + b 
is replaced by 
Gn = (AG,-1)* + ὃ. 
Writing the latter relation in the form 
In a ag, —1 Ἔ b . Ens 
we see that 
fn, = (44,.1)* — agGn, -1- 


314 elements of numerical analysis 


Thus, in fixed point arithmetic, (16-21) again holds with ε = 27'~+, and 
(16-23) with the values (16-29). In view of 


OP =0, k=0,1,.\.,7-2 
Op - = κα 


the recurrence relations (16-19) simplify to 


Gs — ', ἐξ κα -- εἶ... 44 (n > me 0) 


| 


and permit the immediate solution 


Cin Ξ oho, a FO --Ἰ...., 
Τί follows that 
Th 
i, = = Gs. 
m=0 


In view of [αἱ < 1 we thus find 


1 a ἊΣ 1 


14. ᾿ - pels Wan 5 Fr] eae 
(16-32) Ce ie Si a παι τὶ e 
and for the stochastic model 

] Ἐπ gents , 1 
(16- Ξ. ences ark SE oe ae fe ESE 
(16-33) var (r,) =o 7_- = ey καρ 


The bounds (16-32) and (16-33) are remarkable for the fact that they are 
independent of n, the number of iteration steps. An algorithm with this 
property must be called stable under any reasonable definition of this 
term. Newton’s method in particular, which corresponds to the case 
a = Ὁ of the above simplified model, enjoys an extreme degree of stability. 

(iii) Generating cos nm and sin np by recurrence relations. As our next 
example, we consider the algorithm for generating the sequences {cos ng} 
and {sin np} by the algorithm suggested by example 3, §6.3. (These 
sequences are required, e.g., for computing the sums of Fourier series.) 
The difference equation 


(16-34) t, — 2C0S of, 1 + πο = O 


has the solutions ¢, = cos mm and ¢,, = sin nm and can therefore be used 
to generate these functions recursively if the values of cos » and sin @ are 
known. Here the theory of §16.4 applies with 


O(to, ἔγνων υς bya) = 2 cos Pln-1 — n-2 
In numerical computation, (16-34) is replaced by 


(16-35) i, = (2608 gi, -1)* — iy, 


propagation of rounding error 2315 


where Gos denotes a machine value of cos φ. Writing (16-35) in the 
form 
én = 2 008 φί,..} — Enea + Eps 
we see that 
Pd 
é, = (2 cos gi,_,)* — 2 cos gi, _}. 


Adding and subtracting 2 €08 φῇ, -1, we also have 
eee a —_ 
En = (2608 φῖ,. 1)" — 2608 Gi,-1 + 2(COSH — COS φ)ῖς. 1. 


Here we see that e, is due to two sources: (a) rounding the product 
208 of αν, (b) replacing cos » by OS @. Both errors can be estimated 
and permit us to make assumptions such as (16-21) and (16-23). 

To determine the d,,,, we observe that 


oO} = 0, k=0,1,...," = 3, 
7), 55 mae --Ἰ, 
(Fe) se 2 Cos wm, 


The relations (16-19) thus yield 


εἶπα ᾿Ξ I, 
(16-36) Bind iig VCO Gi, 
dhean = 2 Cos 1 Se “ n-9,ms fl = i τ 2. 


These equations show that, for a fixed value of m, the quantities dyn 
themselves are a solution of the difference equation (16-34), with the 
initial values given atn = mandn=m+1. We thus must have 


Ann = Ay, cos np + B,, sin ne, n=mme4+1,..., 


where A,, and B,, are to be determined from the first two relations (16-36). 
Some algebra yields (if 0 < ῳ < πὶ 


_ sin (m — l)p 


cos (m1 — 1})φ 
sin @ sing 


ALS 
sin φ 


B= 


and we thus get, after some further simplification, 


51ηὴ (πα — m+ 1 
roe (1 —m + Ip Pig 
Sin Φ 


Since εὐ = 0, the general theory thus yields 


mn os 
_ wo sin(@a — m+ Ip 
‘nog ne aE 
m=] 


316 elements of numerical analysis 


Using the crude estimate |sin (ἢ — m+ 1)p| S 1, we thus have 


[κω = το, a ee 
sin p 
Using the known formula 
Sa ς ΞΟ 17 δἰη a + Dp 
ΡΥ πὰ τ τον 


we get for the variance the expression 


9 τ (sin(a — m+ Dey" 
var (r,) =o ΕΣ (ee Ξ 
a 5 1 sin(Qn + ve 


~ Ging? [2°47 ~ 4sing 


As in the algorithm for finding the sum of ἢ numbers, the standard devia- 
tion of the accumulated error grows only like Vn, while the rigorous 
estimate grows like w. This relatively high stability of the recursive 
method for generating cos nm and sin np (or, what amounts to the same, 
the Chebyshev polynomials) has been observed experimentally (Lanczos 
[1955]). 

(iv) Horner’s scheme. We next consider the propagation of error in the 
evaluation of a polynomial by Horner’s scheme (algorithm 3.4), assuming 
that the coefficients ao, a,,..., αν of the polynomial and the value of the 
variable x are machine numbers. The scheme then consists in generating 
a finite sequence of numbers gp, gi, .. ., dy by the formulas 


4 qo = 4, 
(16 37) nes + XGn-15 i I, Bs vag dV 


the desired value isqgy. The functions Q,, are evidently given by 


0,(4o; Jise++s ἤ,-.1) = da, + NGn—1i- 


In numerical work, the second relation (16-37) is replaced by g, = a, 
+ (xg,-1)*. Ifthe last term is written as a, + xg,_1, + ει, we see that, 
as in (fi) above, the local rounding error equals the rounding error in the 
fixed point multiplication xg,.,. Thus the assumptions (16-21) and 
(16-23) are again justified with e = 2~'-1, μ = 0, σῇ = 342-74. We now 
have 


Gy ἘΞ ΣΕ, ἢ Oe ces 
OF = x, 
and thus find, as above, 
PA ee GO. 


propagation of rounding error 317 


Since ε = Ὁ we thus have 


m=1 
and find in the usual way 
|x|" — 1 ΠΤ 
ἘΞ ΒΓ ΞΙ " xX τὸ 4 
ie, x= 1, 


and, under the statistical hypothesis, 


nee ed ὁ τᾶς ον, 
εάν νὰ τα FO” : 
νῇσ, χει]. 


Thus, for the evaluation of high degree polynomials in fixed point arith- 

metic, Horner’s scheme is unstable when |x| > 1 and stable when Es πὶ 1. 
(0) Numerical integration of differential equations. We finally consider 

the numerical integration of the simple but typical initial value problem 


7 ΞΕ Ay, y(O) == i, 


Where A is a real constant, A # 0, (a) by Euler’s method; (b) by the 
integration scheme based on the midpoint formula discussed in $14.8. 
The approximate values generated by Euler’s method satisfy 


vo = I 
¥n = Yn-1 + Ahy,,—1; i= | Sea 


This is an algorithm of the form (16-12) where QO = 0 (k =0,1,..., 
n—2), ΟΠ = 1+ Ah. If the local errors are defined by 


Vn sa Pani τ ΔΗ, -. ++ ens 


we find, by computations similar to the ones carried out earlier, that the 
accumulated error r, = J, — y, can be expressed in the form 


ae > (1 + Ah)"-™<,,. 


m=1 
We thus obtain 
(1 + Ah)" — | 
hA > 


In| S 


(1 + Ah" —-1 , 
2Ah + h?A? 


var (r,) = σ΄. 


a 


318 elements of numerical analysis 


Some simplification is possible by observing that for h-> 0, nh = x fixed 
and positive, (1 + AA)" = e4* + O(A). We thus find 


foAx __ 
ial $5 5 το] 


(16-38) 


oo 


ear _ l 
7S 4 + O(t)]. 


These relations show that Euler’s method is numerically stable if A < 0, 
provided that ἢ is sufficiently small. The method is unstable if A > 0, but 
this instability is not serious, since then the solution itself grows exponen- 
tially. Analogous statements hold for all methods based on Taylor’s 
expansion, and also for all Adams’ methods. 

If the midpoint rule is used, the values y,, satisfy 


var (r,) = 


eee y, = e*" (ideally) 
Mig Mit an Ξ 2NAYy 15 ἢ Ξ 2. 


This is of the form (16-12) where 


OA Yo: Viyeues ¥n=i) 7 2HAYy 1 ἘΣ 


A consideration similar to that under (iii) above shows that the coefficients 
εἶναι satisfy, for a fixed value of m, the difference equation 


(16-39) Bim -- 2ZhAdn—1.m — dn-2.m = 0 (n 2 m + 2) 
and the starting conditions 
(16-40) i 4 duvin = ΔΑΝ, 
The characteristic polynomial p of (16-39) is given by 
pP(z) = ζῇ — 2hAz — 1: 
its two zeros are 


z, = Ah+ V1 + (AA? = e4* + Ομ), 
Zo πὶ, 
The solution d,,,,, of (16-39) satisfying (16-40) is found to be 
Y ΞΡ ae lala 
Bic ae ΤᾺ a τς 
A somewhat elaborate but elementary computation using the approxima- 
tions (see 814,8) 
zz = {-. 6 Ὁ + 0), 


zt = 645 + O(h), 


propagation of rounding error 319 


valid for h > 0, nh = x fixed and positive, shows that 


(16-41) Irn] S > dim| = 5 [a + ὦ] 


and, if var (¢,,) = σϑ, 


(16-42) Ce an el jeosh Zax 


h 44 + 0) 


These relations confirm the result of §14.8 that the midpoint formula is 
always unstable, also when A <Q. It produces exponential growth of 
the rounding error also when the exact solution is exponentially decreasing. 
It is clear that such a method cannot be used for the numerical integration 
of a differential equation such as γ' = —y. In a qualitative sense, this 
negative result is true for all methods based on numerical integration where 
the numerical integration is performed over an interval comprising several 
steps. Simpson’s rule, in particular, if applied to the numerical solution 
of differential equations, also suffers from this kind of instability, see 
problem 11. 


Problems 


4. In floating arithmetic it is frequently permissible to replace the statistical 
hypotheses (16-23) by 


(16-43) Elen) = BGins var (ει) = o7@i, 


where « and o are again constants. Discuss the propagation of error in 
the computation of the sum 


(16-44) Gn = G1 + Go +++ + Gn, 


where a; > Ὁ, / = 1, 2,..., by the algorithm described under (7) above, if 
floating arithmetic is used. Show that, contrary to the fixed point 
arithmetic case, the propagated error depends on the order of the terms 
in (16-44), and that both the maximum error and the standard deviation 
are minimized if the terms are summed in increasing order. 

5. Discuss the propagation of rounding error if the geometric series ¥ 25 a”, 
where αἱ < 1, is summed in floating point arithmetic, generating a" by 
the formula a” = a(a"~"). Is the resulting process numerically stable? 

6. The Fibonacci numbers (see example 7, §6.3) are generated from their 
recurrence relation 


Xp = X41 = I, Xn = Xn-1 T Xn-g 


in floating arithmetic. Assuming that the relations (16-43) hold with 


320 elements of numerical analysis 


Gn = Xn, What is the standard deviation of the numerical value *,, for large 
values of n? If the X, are used to compute the number 


1+ v5 = jim ~2+! 


is this a stable process? 
7*, (Generalization of problem 6.) Discuss the propagation of rounding 
error in Bernoulli’s method for determining a single dominant zero of a 
polynomial whose coefficients are exact machine numbers. Discuss both 
floating and fixed point arithmetic. 
Discuss the propagation of rounding error in algorithm 5.6 for extracting 
a quadratic factor, assuming fixed point arithmetic. 


9, Let, for non-integral s > 0, the binomial coefficients g, = (*) be generated 


by the recurrence relation 
do = l, i Se a -Ξ- ἢ ἘΞ Ως 


Assuming fixed point arithmetic, show that 


var (r,) = o7S,(s), 


so = Un) ZU) 


(It can be shown that S,(s) = a/(3 + 2s) + O(1) asn— =.) 

Study the propagation of rounding error in algorithm 12.4 (repeated 
extrapolation to the limit), if the same error distribution (16-23) is assumed 
in the zeroth column A,,,o and in the relations defining the elements of the 
(m + 1)st column in terms of those of the mth column. 

Show that if the problem discussed under (v) above is solved by Simpson’s 
formula 


where 


10 


11 


h 
και στ Ya-1 = 3 nea Ἔ Af + Jn=1) 


in fixed point arithmetic, the standard deviation of the accumulated 
rounding error behaves for x large like cosh (4Ax). 


Recommended Reading 


The detailed study of the propagation of rounding error in numerical 
computation is of recent origin. The classical paper is by Rademacher 
[1948]. Very detailed accounts of the propagation of error in a large 
number of methods for solving ordinary differential equations are given 
in the author’s books [1962, 1963]. 


propagation of rounding error 321 
Research Problems 


1. Make a study, both experimental and (as far as possible) theoretical, 


‘of the propagation of error in the Quotient-Difference algorithm. 


2. Discuss the stability of Horner’s scheme in floating point arithmetic. 
3. The hypotheses underlying the statistical theory of propagation of 
rounding error have been criticized as unreliable (Huskey [1949]). Carry 
out some statistical experiments on rounding error propagation and 
compare the results with the theoretical results given above. (See the 
author’s books quoted above for some ideas on how to perform such 
experiments.) 


bibliography 


This list of books and papers contains the titles quoted in the text; it is 
not intended to be complete in any sense of the word. More complete 
bibliographies on many topics of numerical analysis will be found in some 
of the references given below; see in particular Hildebrand [1956] and 
Todd [1962]. 


Aitken, A. C. [1926]: On Bernoulli’s numerical solution of algebraic equations, Proc. 
Roy. Soc. Edinburgh, 46, 289-305. 

Bareiss, E. H. [1960]: Resultant procedure and the mechanization of the Graeftfe 
process, J. Assoc. Comp. Mach., 7, 346-386. . 
Bauer, F. L., H. Rutishauser, and E. Stiefel [1963]: New aspects in numerical 

quadrature, Proc. of Symp. in Appl. Math., vol, 15: High speed computing and 
experimental arithmetic, American Mathematical Society, Providence, R.I. ! 
Birkhoff, G., and 5, MacLane [1953]: A survey of modern algebra, rev. ed., Macmillan, 
New York. ot) 18 
Brown, K. M., and P. Henrici [1962]: Sign wave analysis in matrix eigenvalue 
problems, Math. of Comput., 16, 291-300. 
Buck, R. C. [1956]: Advanced Calculus, McGraw-Hill, New York, Toronto, London. 
Coddington, E. A. [1961]: An introduction to ordinary differential equations, Prentice- 
Hall, Englewood Cliffs, N.J. 
Comrie, L. J. [1961]: Chambers’s shorter six-figure mathematical tables, W. RK. 
Chambers Ltd., Edinburgh and London. 

Erdelyi, A. (ed.) [1953]: Higher transcendental functions, vol. 1, McGraw-Hill, New 
York, Toronto, London. ᾿ | 

Forsythe, G. E. [1958]: Singularity and near singularity in numerical analysis, Amer. 
Math. Monthly, 65, 229-240. | 

—— and W. Wasow [1960]: Finite difference methods for partial differential equations, 
Wiley, New York. . 

Fox, L. [1957]: The numerical solution of two-point boundary problems in ordinary 
differential equations, Clarendon, Oxford. | 

Goldberg, 5. [1958]: Introduction to difference equations, Wiley, New York. 

Henrici, P. [1956]: A subroutine for computations with rational numbers, J. Assoc. 
Comput. Mach., 3, 10-15. 


—— [1958]: The quotient-difference algorithm, Net. Bur. Standards Appl. Math. 


Series, 49, 23-46. 
322 


bibliography 323 


Henrici, P. [1962]: Discrete variable methods in ordinary differential equations, Wiley, 
New York. 

-——— [1963]: Error propagation for difference methods, Wiley, New York. 

~—— [1963a]: Some applications of the quotient-difference algorithm, Proc. Symp. 
Appl. Math., vol. 15, High speed computing and experimental arithmetic, American 
Mathematical Society, Providence, R.I. 

Hildebrand, F. B. [1956]: Introduction to numerical analysis, McGraw-Hill, New York, 
Toronto, London, 

Householder, A. S. [1953]: Principles of numerical analysis, McGraw-Hill, New York 
and London, 

Huskey, H. Ὁ. [1949]: On the precision of a certain procedure of numerical inte- 
gration, with an appendix by Douglas R. Hartree, J. Res. Nat. Bur. Stand., 42, 
57-62, 

Jahnke, E., and F. Emde [1945]: Tables of functions with formulae and curves, Dover, 
New York. 

Kantorovich, L. V. [1948]: Functional analysis and applied mathematics, Uspekhi 
Mat. Nauk., 3, 89-185, Translated by C. D. Benster and edited by G. E. 
Forsythe as Nat. Bur. Stand. Rept. No. 1509. 

Kaplan, W. [1953]: Advanced Calculus, Addison-Wesley, Cambridge. 

Lanczos, C. [1955]: Spectroscopic eigenvalue analysis, J. Wash. Acad. Sci., 45, 
315-323. 

Liusternik, L. A., and W. I. Sobolev [1960]: Elemente der Funktionalanalysis (trans- 
lated from the Russian). Akademieverlag, Berlin. 

McCracken, D. Ὁ. [1961]: A guide to FORTRAN programming, Wiley, New York. 

— [1962]: A guide to ALGOL programming, Wiley, New York. 

Milne, W. E. [1949]: Numerical Calculus, Princeton University Press, Princeton, N.J. 

—— [1953]: Numerical solution of differential equations, Wiley, New York. 

Milne-Thomson, L. M. [1933]: The calculus of finite differences, Macmillan, London. 

Muller, D. E. [1956]; A method for solving algebraic equations using an automatic 
computer, Math, Tables Aids Comput., 10, 208-215, 

National Physical Laboratory [1961]: Modern Computing Methods (Notes on Applied 
science No, 16), 2nd ed., H.M. Stationery Office, London. 

Naur, P. et al. [1960]: Report on the algorithmic language ALGOL 60, Comm. 
Assoc. Comp. Mach., 3, 299-314. 

Nautical Almanac Office [1956]: Interpolation and allied tables, H.M. Stationery 
Office, London. 

Ostrowski, A. [1940]: Recherches sur la méthode de Graeffe et les zéros des polynomes 
et les séries de Laurent, Acta Math., 72, 99-257. 

—— [1960]: Solution of equations and systems of equations, Academic Press, New 
York. 

Rademacher, H. [1948]: On the accumulation of errors in processes of integration 
on high-speed calculating machines. Proceedings of a symposium on large 
scale digital calculating machinery, Annals Comput. Labor. Harvard Univ., 16, 
176-187. 

Richardson, L. F. [1927]: The deferred approach to the limit, I—Single lattice, 
Trans. Roy. Soc. London, 226, 299-349, 

Rutishauser, H. [1952]: Ueber die Instabilitat von Methoden zur Integration gewéhn- 
licher Differentialgleichungen, Z. angew. Math. Physik, 3, 65-74. 

—— [1956]: Der Quotienten-Differenzen-Algorithmus, Mitteilungen aus dem 
Institut fur angew. Math. No.7. Birkhauser, Basel and Stuttgart. 


324 bibliography 


Rutishauser, H. [1963]: Ausdehnung des Rombergschen Prinzips, Numer. Math., 5, 

48--53, 

Schwarz, H. R. [1962]: An Introduction to ALGOL, Comm. Assoc. Comp. Mach., 5, 
$2-95. 

Sokolnikoff, I. S., and R. M. Redheffer [1958]: Mathematics of Physics and modern 
Engineering, McGraw-Hill, New York. 

Speiser, A. P. [1961]: Digitale Rechenanlagen, Springer, Berlin. 

Steffensen, J. F. [1933]: Remarks on iteration, Skand, Aktuar. Tidskr., 16, 64-72. 

Taylor, A. E. [1959]: Calculus with analytic geometry, Prentice-Hall, Englewood 
Cliffs, N.J. j 

Todd, J. (ed.) [1962]: Survey of numerical analysis, McGraw-Hill, New York. 

—— [1963]: Intreduction to the constructive theory of functions, Birkhauser, Basel 
and Stuttgart. 

Watkins, B. O. [1964]: Roots of a polynomial using the QD method. To be 
published. 

Watson, G. N. [1944]: A treatise on the theory of Bessel functions, 2nd. ed., University 
Press, Cambridge. 

Whittaker, E. T., and G. Robinson [1924]: The calculus of observations, Blackie, 
London. 

Wilkinson, J. H. [1959]: The evaluation of the zeros of ill-conditioned polynomials, 
Numer. Math., 1, 150-180. 

—— [1960]: Error analysis of floating-point computation, Numer. Math., 2, 319-340, 

— [1963]: Rounding errors in algebraic processes, National Physical Laboratory Notes 
on Applied Science No. 32, H.M. Stationery Office, London. 


INDEX 


Absolute value, of complex number, 2] 

Adams-Bashforth method, 276 ff., 283, 
285 f., 318 

Adams-Moulton method, 280 ff., 318 

Aitken’s A*-method, 72 ff., 151, 174, 
237 

Aitken’s Lemma, 204 

Algebra, fundamental theorem of, 34 

ALGOL, 9 

Algorithm, 4 

stable, 314 

Argument, of complex number, 2] 

Asymptotic expansion, 239 

Averaging operator, 225 


Bairstow’s method, 110 ff., 176 
Base, of number system, 291 
Basic unit, 301, 303 
Bernoulli's method, 146 ff. 
stability of, 320 
Bessel function, 177 f., 189, 190, 203, 
211, 223, 229, 230, 235, 237, 258 
Bessel’s interpolation formula, 226, 235, 
249 
Binomial coefficients, 26, 40 f., 52 f., 56, 
140, 142 f., 204, 214 ff., 253, 308, 
320 
Binomial theorem, 41, 140 
generalized, 251 
Bolzano-Weierstrass, theorem of, 5 


CDC 1604 computer, 297 
Chebyshev polynomials, 124, 144, 194 
Circular frequency, 160 
Constructive method, 3, 264 
Complex conjugate number, 24 | 
Complex number, 14 
root of, 27 
Complex plane, 18 
Continued fraction, 177 
Continuity, uniform, 193 
Contracting map, 99 
Convergence, 
acceleration of, 10, 70, 116, 149, 152, 
174, 237 if., 243, 259, 272 f. 
of algorithm, 9 
linear, 75, 104 
quadratic, 76, 103 f. 
Conversion, of binary numbers, 292 ff, 
Corrector formula, 281 
Cramer’s rule, 107 


Dedekind section. 6, 169 | 
Deflation, 86, 162 
Derivative, 

of polynomial, 40 

evaluation of, 54 

total, 266 

Determinant, Wronskian, 125f., 129, 
133, 136 ff. 


325 


326 Index 


Difference equation, 
homogeneous, 49 f., 50, 120, 135 
linear, 48, 119 ff., 284,310 
non-linear, 243, 282 

Difference operator, 
backward, 141 
central, 225 
forward, 72, 214 

Difference table, 142, 304 

Differences, 
accumulation of error in, 302 
backward, 248 
central, 249, 285 

Differential equation, 44, 263 
numerical solution of, 263 ff. 

by methods based on integration, 
276 ff. 
by methods based on Taylor’s for- 
mula, 267 ff. 
propagation of error in, 317 
solution of, 263 
Differentiation operator, basic, 235 
Discretization error, 286 


End correction, in trapezoidal formula, 
257 
Error, 
asymptotic formula for, 10 
bound for, 10 
Error terms, oscillatory, 286 
Euler method, 267, 318 
Euler-Cauchy method, 267 
Everett's interpolation formula, 226 
Expected value of random variable, 306 
Exponent, of floating number, 299 
Exponential function, definition of, 31 
Extrapolation to the limit, 239 f., 271 f., 
288 


Fibonacci sequence, 127, 148, 164, 319 
Fixed point, of iteration, 63 
Fixed point arithmetic, 297 
Floating point arithmetic, 299, 312 
FORTRAN, 9, 300 
Fraction, binary, 293 
Function, 
analytic, 261 
domain of, 30 
periodic, integration of, 259 


Function (cont.) 
range of, 30 
sufficiently differentiable, 265 


Gauss’ backward formula, 223 
Gauss’ forward formula, 223 
Gaussian quadrature, 255, 262 
Generating function, 251 
Graefte’s method, 161 


Horner’s scheme, 51, 85, 316, 321 


IBM 1620 computer, 297 
IBM 7090 computer, 297, 299 
Imaginary part of complex number, 15 
Imaginary unit, 13 
Incremental form, of numerical for- 
mula, 88 
Influence coefficients, 311 
Initial value problem, 
for difference equation, 48 
for differential equation, 264 
Instability, numerical. See Stability, 
numerical 
Interpolating polynomial, 183 ff. 
error of, 186 
existence of, 183 
representation of, 201, 207, 208, 221, 
223, 225, 226 
finalized, 218 
Interpolating polynomials, sequences of, 
191 
Interpolation, inverse, 209 if. 
Interpolation coefficients, 
for Everett interpolation, 227 
for Lagrangian interpolation, 184 
normalized, 201 
Iteration, 61 ff. 
inner, 282 
propagation of error in, 313 
Iteration method, 
convergence of, 
for functions of one variable, 65 
for functions of several variables, 
99 


Jacobian matrix, 104 


Kepler's equation, 67, 68, 70, 92 


Lagrangian interpolation coefficients, 
184 
normalized, 201 
Legendre function, 253 
Leibnitz formula, for differentiation of 
product, 42 
L’Hopital’s rule, 186, 188 
Linear combination, 130 
Linear dependence, 128, 136 
Lipschitz condition, 63, 100, 265, 270, 
282 
Lipschitz constant, 63, 101, 270, 278, 
282, 283 
bound for, 64, 101 
Logarithms, calculation of, 242 
Loss of accuracy, 243 


Machine number, 297, 313 

Mantissa, of floating number, 299 

Mean value theorem, 64, 70, 75, 80, 94, 

188, 310 

Midpoint formula, 251, 285 f., 318 
applied to differential equations, 319 

Midpoint value, 260 

Missing entry in table, 192 

Modulus of complex number, 2] 

Moivre’s formula, 26 

Muller’s method, 88, 194 ff., 212 

Multiple precision, 301 


Nabla, 141 
Neville’s algorithm, 207 ff. 
Newton-Cotes formulas, 255 
Newton’s backward formula, 223, 248 
276 
Newton’s forward formula, 223, 254 
Newton's method, 76 ff., 175 
applied to polynomials, 84 
non-local convergence theorem for, 
79 
for systems of equations, 105 ff. 
Norm, of vector, 98 
Normal probability distribution, 306 
Number system, 291 
binary, 292 
Numerical analysis, definition of, 3 
Numerical differentiation, 231 ff., 242, 
245 | 
error of, 232 
for equidistant abscissas, 233 


3 


Index 327 


Numerical integration, 246 ff. 
error of, 247 
over extended intervals, 254 
using backward differences, 248 
using central differences, 249 


Operator, linear, 120 
Overflow, 297 | 


Pascal’s triangle, 53 
Polar representation of complex num- 
ber, 22 
Polynomial, 33 
approximation of, by polynomial of 
lower degree, 193 ff. 
characteristic, 121 
degree of, 33 
leading coefficient of, 33 
real, 33, 38 
representation by linear factors, 35 
zero of, 34 
Polynomial equation, 6 
Predictor formula, 281 


Quadratic factors, 39 
determination of, 108 ff. 
Quotient-Difference algorithm, 162 ff. 
computational checks for, 174 
convergence theorems for, 166 
progressive form of, 170 


2 σπσοα, ἀαπεοσασε. τον 


Random variables, 12, 305 

independent, 307 
Range, 

of function, 30 

of machine, 297 
Real part of complex number, 15 ] 
Regula falsi, 87, 213 
Representation, correctly rounded, 297 
Rhombus rules, 163 
Rolle’s theorem, 187, 224 
Romberg integration, 255, 259 ff. 
Roots of unity, 28, 37 
Rounding, 10, 302 
Rounding error, 169 

accumulated, 302, 309 

local, 302 

definition of, 309 


328 Index 


Runge-Kutta method, 
classical, 275, 277, 281 
simplified, 274 
Runge-Kutta-Taylor method, 275 


Scaling, 298 
Schwarz inequality, 101 f. 
Sequence, 
monotonic, 80, 94 
of real numbers, 46 
of vectors, 99 
Sign waves, 158 
Simpson’s rule, 251, 287 
applied to differential equations, 
3191. 
Solution, 
of difference equation, 46 f. 
peneral, 122 f. 
particular, 131 ff. 
trivial, 121 
of differential equation, 44 ἢ. 
of equation, 62 
Square root, computation of, by New- 
ton’s method, 81 
Stability, 
numerical, 11, 168, 302, 314 
of methods for solving differential 
equations, 283 f., 317 
of numerical differentiation, 242 
of quotient-difference algorithm, 
169 
of solutions of difference equations, 
128 
Standard deviation, 306 f. 
Starting error, 277, 279, 286 


Starting method, 278 
Statistical theory of rounding, 305 
Steffensen iteration, 
for single equations, 91 
for systems, 115 
Stirling’s formula for n!, 192, 308 
Stirling’s interpolation formula, 225, 
233, 250 
Stochastic model of rounding, 305 
Sum, evaluation of, 312 


Table, well interpolable, 190 

Taylor algorithm, 267, 270, 271, 272 

Taylor expansion, 
for solving differential equations, 265 
of polynomial, 54 

Taylor’s formula, 270, 272 

Theorem, role of, in numerical analysis, 

10 

Throwback, 227 ff. 

Trapezoidal rule, 250, 255 ff., 259 ff. 
with end correction, 255 

Trapezoidal value of integral, 256, 260 

Triangle inequality, 24, 99 


Variance of random variable, 307 
Variation of constants, 49, 133 
Vector notation, 98 


Weierstrass, theorem of, on polynomial 
approximation, 192 
Whittaker’s method, 87 


Zero, of polynomial, 34 
dominant, 146 
multiplicity of, 36 


ANSWERS FOR PROBLEMS 
Chapter 1 


1) If n > 1 is the given integer, divide ἢ by all 
integers m from 2 to the greatest integer not exceeding 
Vn. n is prime if no such m divides n without remainder. 

2) Algorithm infinite because ΓΞ irrational, hence 
decimal fraction neither terminating nor periodic. 

3) Existence of Va for real a > Ὁ: proof of the 
fact that every bounded monotone sequence has a limit; 
existence of supremum and infimum of a bounded set of 
Peal numbers. 


Chapter 2 


1) a)+ da Ὁ) τ΄ c) cos2y + i sin 2 
ἃ) cosm +i sing e)l. 
2) Not defined for 2 = -l1, 2 #0. 


3) ie zn+l 
Ὁ ἘῈΣ τὰ 
4.) 8) 2, φ-- ι dkw Ὁ) 10, φ = 126°45' + k 360° 
eyo? φ ee ὅκα. 


5) argw = op + okt , lwl= #2 =1 

8) If k = 1, then locus is imaginary axis. If k # 1, 
let Le Ke 

Ξ 1... καὶ 
The equation is 
oS = ate 4+ 2) #2] =O 
or in real form 
Ἐπ ose" χν ὃ 

(circle with center at (ᾳ, 0) and radius ας - 1). 


9) a ἐπ τρί ὦ Sm 
ic ἃ ἢ κα Am.+ 1 
ΕΣ i «= 4 if n = 4m + 4 
10) ¢ we) ig Rh wy 2, cas Son, where 
@ = aaa 
τ or a: [1 


329 


330 331 
τ) An 7 2) f(x) =M- ¢sinx maps [m - TT, 0 +7] into itself 
- n+l Ante and satisfies Lipschitz condition with L = [6] 41. 
ὦ l1-2rcosp +r 4) fiz) =. ὃ £x $1. Σ' is unbounded! 
5) x = 0.96444488 
12) ak & 2S 4 ) aoa 
b) exp i(e + 2 ae) We Oy κ᾿ ae 5 τ "22 Sire 2 
T2 a “538. ues. : 9) Any M such that - § <M < O will do. | 
ey ἃς Siete : as Sue 5 10) s = 2. For x 20, y2 0, | f(x) - f(y)| ‘se | x ‘i γί , 
13) cos4¢ = 8(cos¢ + - 8(cos¢ 5° +1 11) [=\6 = s | 2.1 210"? . 
14) vx + 1y = 13) Prob. 7: χὶς = 1.839286 , Prob. 5: χὰ = 0.964318 
ne 16) 2.766204 
t j X+_5X + Fo ae (signy i [3Ξ5Ξ:" 55 τς τς | 19) M = -- EG) 
| , d " : 
15) ἐπί cos 7p - σοϑδῳ - 3 cos ϑῳ - 3 cosy ) a0) 12m nei _ EG 
18) a) ellipse with semiaxes [8 + bj] and [8 - Ὁ| ate 2 
Ὁ) parabolic are x = 2y° «71. -1 $y $1 21) f(x) =x x42 : X, = 3.16216216, x, = 3.162277660 
822) 4ἃ)51 521 Ὁ) 5 251 ο) 51 521, 522: ax + 5 
24) a = 4 = 5 = GO, a = 4 , 22) x = 0.5671444 
Titers 85 4 gc . 
24) three. 23) War = 1.772453851 
25) two, e4 ) #1 a 0.4, x - 0.22, wi sed 0.3553, 2 = 0.93955955, eee 
(x, has 2” 3's). 
Chapter 4 " 
57) h(x) = - 55 Ξ 
1 = abt! 1 8) x = 1.839286 
1) x, = “fo oo ᾿ ΙΝ x, = ot 2 ) x=l. 50 7 
| n! 2° : 29) x = 0.9024738 
2) ad ie see + Fr) 32) χὰ δ" : x, 20, ἢΞ1, 
a) &,. * by + Ὁ) + ++. + DB, = n-th partial sum of series. 34) 1.782191 
Tt 2 35) 34.16227766 (sequence identical with that obtained 
+) X, * eel (1 - a by Newton's method, if starting values are the same). 
3 2 2 5350) s = 1.4655712 
8) p(x) = 2.392 - 4.92(x + 0.3) - 3.6(x + 0.3)° + 4(x + 0.3) 
9) P(1.5) = = 22.625, P't1.5) = « 37.625, P"(1.5) = = 39 Chapter 5 
Chapter = 2) x = 0.7718445 , y = 0.4196434 


8) x = 4.9322865 , y = 5.0673188 


1) a) [e'Ge)| = [2 sinx] $2, thus a - 2 9) x = 1.0309038 , y = 1.0033246 : 
Ὁ) [τ᾽ (χ}} = 5 , thus Ὁ - ὁ 10) 3.0053220 + i 3-9963692, 0.9946779 + 1 1.0248789 . 
c) [f'(x)| " 4, , thus | £' («| « Σ for x 2 2, 11) Quadratic factors x” + 1.5415682x+ 1.4873158, 
x 


x? - 121272386 x + 0.8316443, real zeros - 1.8620342, -0.7236362. 
thus L 


ἢ 
- ir 


333 


2) x = 0.933012 

5) 2 = 4.189791 

6) χὸ = 0.933013, x1, = 0.933011 

7) [2;3| = [25 (= γξ) (compare condition [22 |>]25 | 
at top of p. 150) 


μ᾿ 

.Ὡ» 
μὴ 

teal 
bd 
| 


" Ba ¢5(-1)" 
Ὁ) x, = cy coset # Cp sino 
(-1)" (0, + 658) 
2) x, = αἡ [-i(2 - v2]? + 55 [-i(1 + v3] 
5) x= τ (ον cosnpy + C5 Sinng ), where 
κὰν: νΌ Ξ ΟΣ inode ον 


ΤῈ 1+ 6 bs By, te 5-ὶ 26) 


d) x aa 


Oo 
ἧς ὦ» 
tal 
Ι 


10) Dominant zero has multiplicity 2. 


7) 8) 16) Dominant zeros are 2 = = 2. 
8) = 
17) In the left half-plane. 


| 


cost is solution of x > ex, 3 + ex. = 18) All three zeros are dominant. 


: n n 
12) a) Ὡς ἃ 6, F + On 2 + 1 


| Chapter 8 
τν Fok Sa) * Re Chapter 8 
n 
ἃ) Ἐς 3.80 2° + C5 Geta" zn“ - 3n - 4 1) Scheme cannot be generated in a stable manner. If 
n 
13) x, = 255° 4 Bebe 4 1 constructed according to §8.5, scheme is stable and yields 


the geros  - 2:2 3, % = 2, 


OSes mids Pr tg) + ghee” 4 1s 5 
x, ~1is *o - ἧς 2 ‘ 


" 
2) ek) = 1. ἢ 2 Oo, Καὶ δ: ὃ: gi? =n+ kyon 20, ΚΕ > O. 
is) y, - [1 4 5 Ὁ 75 sinB Joy hun. - χολδ | 


1 2 ἢ 
5) a). ae +g’ (1) an: 
7 In + gu » &y ""- n n+ } 
every 8 years; if a and Ὁ are non-negative, the condition ΕἸ +a) τοῦ ) 


16) =. - na? fps at 2 ee ee . 1 bt 
a" Soe πῆς : τ # a Bre 5 1 = 8. He-itad 


50) 8 Fie τ 6x, Sap ie | - 6 53 7 4 


= 0.066987 . 


Pasty PoP τὰ πῶ 7) Spay to? 12) 0.9305682, 0.6699905, 5 ances. SR, 
5) πὸ - Mp1 + 32Xp__la - 58 κ.. 9 Ὁ 57Ky_y 14) Real zeros: 4.70967, -4.41484, 1.00608, quadratic 
a ere ee. factor has zeros 1.85562 = i0.544194. 
ἐ n+l i ; 16) a) 3.0053221 = i 3.9963933, 0.9446779 = i 1.0248789 
(V'R(t)) 5 = δε - 19° αὶ (8). 


b) 5.7074300, 1.0725870 = i 0.4775376, 3.5736981 


2 Wke(x) = 1 - “hyk ας 
3) (soe {t= 6 "7's Σ i 0.1.5751498 


κ 
2 » Π ἶ Hie 
9) Mya 7 oy CHL) (ὦ Vox, 5) 0.1700599, 0.9024738, 0.7351475 t i 0.9018407, 


-0.6464144 Φ i 0.8862372. 
Chapter 4 


1) x - 0.59335 


334 | 335 
Chapter 11 
Chapter 9 


| 1) Define f arbitrarily for 0 $x < h and set (for 
2) No; if f(x) = ¢c, the polynomial P(x) = ¢ interpolates 
he καὶ i) fle ὦ ph} = ἃ τ οὐ" f(x), 03 = 2h, eS δ, = 1, 
at arbitrarily many voints. τ 

fi - ty coe « DOP Be = ~l1,f(x) = 0 is the only solution. 
7?) From x = 1° 42' onward. nx h — h 1 

| 5) Ave = (e - l)ve . Use === v1. 

8) From x = 16 onward. | 


10) Convergent for ἢ ς log2. 
9) a) ἢ 8 0.003 Ὁ) ἢ = 0.08 suffices. 


10) | f(x) 


P(x)| £ 0.063 κ΄ M, , 0 £ x £h. Chapter 12 


11) Th rror is at least 1; Error = (6 - , atl : 
) @ e€rio 5 1) ' (Ο᾽ a P'(0) r — (O41) (gy (-n)® 6 < 70 4s nh. 


13) Choose Xp = Ss Seas Ὁ mag .. Sieh | 2) By preceding problem, error is at least ote 
onag, [BOO = 8. - | 5) (0) = $0) 5 £m) , Bong) | ne Ἐς he. 
| 15) For h¢ 1. 4) PL(o) = § [δέν - $478 8 - oe 4 {πο lal A*:J. 
| 18) P(x) = Q(x) - (253)" 80 cos( ἢ δροοοβξξτθτβ) 2.1.1 : 7) (2-7): + » (12-8): 26 ‘ 


ah ἢ = 6.6185 « 
11) 1.667 — and 1.889 -- ᾿ 


| 43) (as stated) J (2.4068 ) Ὁ -0,00101 


(corrected) Jj(2.4048) w 0.00002 | 

. 1 | 3) a) £69) (x,y) = y + (mn + 1)e* 
10) f(s) ~ 0.6718 

Ὁ) £6) (x,y) 


| 
| [β) - οὐ 5 BS)? [ag | 2.5} 
19) P(x) = 10x? - 35x? + 50x? - 25x + 2 <a 
20) P(x) = 3x - ax + a (Jerror| $ -- ) 
21) P*(x) = tz x - 2s (jerror] 5 ΡΞ + 5 Ξ + - ΠΝ Chapter 13 
Sob es fee SS i a er Oe | 3 
| 22) P(x), = x 515 ({error| 3f3 57196) | 1) actual error = 2; error bound = το ας, 
| εν κ 
β | Chapter 10 , : | 2 a, t= log(l + Ὁ) 
| 1.85194704 
| | 1) See example 8-6. i 3) az 
| 2) See vroblem 8-1l6a. Chapter 14 


1) 8) yes, Lb = 1 6) yes, L = 1 &) nO 4) Ho 


iT 


ΡΒ (6) y, where P. = volynomial of 
11) See problem 10-4 n . n 


degree ἢ satisfying recurrence relation P,(z) = z, P. .(z 
13) vo ~w aap - 1417S... 162) n+1 62) 


= 2 [ Pt(z) + P(z)] . 


14) Ten extrapolated terms give only 1.646109. 
| 15) x = 5.5200299 


5} h solution 
0.5 0.81250 
0.125 0.67414 


8) Twice extrapolated value 0.65524. 


11) ἢ Yn 
0.4 0.304405 
0.2 0.304588 


Extrapolated value: 0.404600. 


Chapter 15 


1) 11010010101, 11011101010, 11100111010. 

S\ bw 2s Gwe", Mae 

3) lflog2 = 3.3219... 

4) 10 = 101, 100 = 10201, 1000 = 1101001 . 

5) N = 10 

6) W = 11.001001000100 

7) a) 55. Ὁ) 55 

8) a) 3 = 0.01 Ὁ) ἢ = 0.13 6) 5 = 0.0 

9) f2 = 1.01101010... , Fe - 2.2200020..... -: 
5 


10.001111000... 


Chapter 16 


1) 3.751079 ; No. 


an — 

pa et Βὰν δε όσα κῃ. 
th r a are δας 
4 = 


ἐπ δε απαὸ sare Sy 
= ttt ar 
: rea ΡΟ + ieee 
Ἔ ars Paeerr oe iso ee oe sas δ εν ΝΥ ΟΡ, 
τ να τες : : *- ee : aerial δ ΠΝ ΗΝ Ξ ᾿ : sare αἴξ αν Ἐπ ee eee 
eames do — it Pp aroma ieee at a πρὶ παε ει - : =“ See - τ TT 
ast Fad a IS στα, πε παττὰ Κα. ας πὶ κα αι τέ ατα ας aT EE = ars a ταιττιτι-, - a = - ate oom sth; 4 = »-- = Ξ 
δι κα στε σα. τα eee εοὐ---ἢ- ὁπ ee - - ae ars as bob ἀ- md nay Boat at ej - - - ye rar ται 
boa ὁ a ~ Ἔ ts pe Γ 
ere ματα ταῖς Sea κξὶ καὶ καὶ υσκ σε i= $A tt σα 


τὰ ἢ Be ee oe 
eee ee ee eee ed 
Fei eaee See ee 


we 


aa 
ep rears | erp i 
1 


a oi 9 2 9 roe ok 
| et roe psed pa ure πὸ τὸ SES cade SLMS “ὦ 2 ea ae πῷ πὶ 
“πεν 4 κ΄ 4 τ - = 


ee ee δὶ προ 
eee rr — --ὐρτιμαν. 
a ie κν στα-την ον = . 5 - 
i 7 


i ba feeb τ- τῴ tin - 
de ee αὶ 
a Peek ἐστ αν στα tee 


en ee κι 
ee ἡ ger de 


πάσαν 

arias 
ip aren 4 
=e 
nner 


ἡ 4 ἐ- 
me 


Para τι στ. : ea + i 
τ - "ὦ ἄτα. ἀγα τα ae ly 
τι τὸς τὶ κατ ας ee ε΄ pope σα τεσικο νον ; : 
— ; tee ae E a 1 i 
= πὴν 


τ ἀπ + - ve ee ree 
= J ' rome Ξρτ —- rr 
Soe Sarna A el αἷς, a re raed τα ajeaihete τ ττα τ tad a arse = 1 


s ee τ 
ee ee τος. 
Sp ge beset 


et ee αὶ 
meet 


sri ta 
ah 
—e + 


poe 


Poe ee pe EE τα er 

er ee ee ee he πγπε ατο νος 
ee ῳ ᾧ μαῶα δ τ 1 = : 

ἀ -α 8 7 a - εὐ νόου! 


+ β' {Ὁ 
ς κα i 
cit Rass 
fy 
rs τ κα ras 
rer Soran 
τὸς κε τα. πτωὶ ι 
ται τα aaa a 
= = 
oe res τα 
- = 
ests tats = Sebetphter pe ces 
δ: παρ᾿ -.- eid 


eos 
Si Sa τιν ταν 
Ce ΤΣ ΝΣ πὸ eee 
aie tT atest es 
ee ie ρει ἐ--- 
alien as 


ἘΠ 
Pi ar ates Yor τὰ τε σὰ 
πα τ’ poet 


τὰς 
δ τις 


i θυ όον δὼ». 

πεῖν ete as Paar a τ’ 

ae art —— oS = PL rar ἂν ia πῇ 

9 eres = See ate - re arn on rae es ε +--* = eae ae rece err sae EY 

_ I) ee - ταν ᾿ : τ Ὥς 5 Sie ἢ Ld 

el ae ee “ 2 Ρ ΠΟ od - J ees eee im core oh bb pda 
= Pras Poe Ts 

So foe pf abe oe 

—r Ph or rie π τὲ τ ὁ 

μα μα pe 

er re ie ΝΩ 
᾿ ἀνὴρ δ»: ἀπαιτῶ sia es 
Se aw τ 


4 : ae pe 
1m + ie - ph 
a a 


τ ἃ be σι 
ee «- 


ματα δος, ἀν ΤῊ 
4--ἰ- ἠνι αήκε. ὁ ᾧ 
anes ατοὐντ- 

+ 


4 μαι μαι ἤπείτιπη  τῆττ 
τὰ (cere στα κα ΒἸδ ρα, μπ68- πὸ 


rare a 


ged at 
id ee a κ΄ ἃ-- 
patina mere eS αμα 


Se gar oi 


oa καὶ as 


ee es = aa 
saat er Sear ar eae 
a ee ᾿ς 


= St — 
: : sr ας πα ee eer ee ς 
᾿ς = aa Saber ak pat Soest el re eee 
aeons eras k morn Fl os oar ee = : — 5... 
at a Sree ἀτεαν, εἰς ἐπὶ πέτολῖε ταν μας μαι πῦρ 3 Ls ᾿ - Pear sores ae : - 
TRESS et era π τ ‘ ere ens 
5. + 5 7 
πε τε α i re 
Spec aerate wy State εἰ εὐ εκ ree ὁ τὸς ts ae Ἶ 
ε: sha fe μη rs Le + See ἀπ τ raat 
a  ας ΡΟ Es ae a oro 


= δ᾽“ ἐν 
Σ paar δ πα σ᾿ πυῦπ 


ιν tt ΣΝ τ om 
hoe Crea = πα μα πε τὰ 
- it 


Ta 
rere 


ΤΩ 


ΤΠ} 


᾿ = 
απ τὰ στὰ Po 
"τα ΠΣ 
a ee oes κε. 
[er re ead 

τ στ Ἐς τρι τθεὶ 
τόν τα τον ή τ 


ae See 
1. 


a 
ee oe ee 
ee 


3 


Tt 


ore ἂν 
ss pereresreb oredr 
ee Se ee κῆς πτιδηι αν erat a Ἐπ = 

Paar “ea cr το ἐστόν τὶ 


4}. 


ae he τὶ 
ras τα msm th 
a jd abe 
fan eee ε- 


5 pret 
Tea 
* 


“- 


aie 


veto anes ets 


+h 


a τὶ τ 
ees τι a 
Boe ee oe 

-- a 

oases nb = — Ft 
ae - 

-- 


a 
+ 


oes ee Ἐ ῳ 4 
- ia gr ταν τοις σὴ 
Cree eo 


ee 


